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THE ONSET OF ACADEMIC UNDERACHIEVEMENT IN 
BRIGHT CHILDREN’ 
MERVILLE C. SHAW 
Chico State College 


anp JOHN T. McCUEN 
Monterey Peninsula College 


The problem of onset of academic un- 
derachievement among bright children has 
been the subject of some speculation, but 
very little research. After reviewing the 
findings of their study on able high school 
underachievers, Shaw and Grubb (1958) 
hypothesize “that | underachievement 
among bright students is not a problem 
which has its genesis within the educa- 
tional framework, but rather one which 
the underachiever brings with him, at least 
in embryo form, when he enters high 
school.” In an intensive study of a small 
number of gifted underachievers Barrett 
(1957) found an underachievement pat- 
tern present by Grade five, but did not in- 
vestigate the grades below five. No stud- 
ies specifically attempting to determine 
whether or not there is any particular 
grade level at which underachievement be- 
gins, or attempting to determine at what 
level it begins were found in the literature. 


PROBLEM 


The purposes of the present study were 
to determine whether there is any specific 
academic level at which academic under- 
achievement can be said to begin and to 


*The authors would like to express their 
sincere thanks to Theron L. McCuen, Super- 
intendent of the Kern County Union High 
School District, and the administrative staffs 
of both Bakersfield and East Bakersfield 
high school for their cooperation in this 
study. 


discover the subsequent pattern of achieve- 
ment. The information resulting from the 
study has both practical and theoretical 
implications. On the practical side, the 
problems of prevention and remediation 
of academic underachievement might con- 
ceivably be effected by such results. Broe- 
del, Ohlsen and Proff (1959) confirmed 
the hypothesis of Shaw and Brown (1957) 
that underachievement among high school 
sophomores is not a surface phenomenon 
which is easily modifiable, but rather is 
related to the basic personality matrix of 
the individual. If it is true that academic 
underachievement is related to basic per- 
sonality structure then such behavior is 
likely to occur during the early elementary 
school years. Specific information regard- 
ing the point at which underachievement 
actually begins has implications both for 
preventive and remedial measures that 
may be undertaken. 

Such information also has implications 
from a theoretical point of view. The prob- 
lem of the genesis of achievement motiva- 
tion has been a topic of concern to Mc- 
Clelland and his associates (McClelland, 
Atkinson, Clark, & Lowell, 1953). Hy- 
pothesizing that the scores of college males 
on the McClelland Achievement Motiva- 
tion Test would be effected by child rear- 
ing practices of their parents, they were 
able to isolate certain differences between 
subjects who received high scores and those 
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who received low scores. Their criterion of 
achievement, the MAMT, has not been 
validated as a predictor of academic 
achievement, however, and it would not be 
reasonable to conclude from their results 
that academic underachievement had its 
origins in parental child rearing practices. 
Should it be found that academic under- 
achievement is present in the earliest 
school years, and is found with some con- 
sistency in the same individuals through- 
out their school careers, more credence 
could be placed in the general findings of 
McClelland et al. (1953) as they relate to 
academic underachievement. 


METHOD 


The general plan of the study was to 
select Ss who were in the upper 25% of 
the school population with regard to ability 
and to classify them as achievers or under- 
achievers on the basis of their cumulative 
grade-point averages in Grades 9, 10, and 
11. The intelligence measure used was the 
Pinter General Ability Test: Verbal Series, 
which was administered to all Ss included 
in the study at the time they were in 
Grade 8. A student who achieved an intelli- 
gence test score which placed him in the 
upper 25% of the population (over 110) 
and who had earned a grade-point average 
below the mean of the class he was in, 
was classified as an underachiever. A stu- 
dent who earned a GPA above the average 
of his class, and whose IQ was over 110, 
was classified as an achiever. Those who 
fell exactly at the class average, which was 
2.40 on a four point scale, were not in- 
cluded in the study. Only eleventh- and 
twelfth-graders were included. 

A further criterion for the inclusion of 
a S in this study was that he must have 
attended school only in the school district 
served by the high schools in the study. 
All Ss, then, have had all of their formal 
education in a single school district. This 
criterion was established in order to re- 
duce the variability in grades and educa- 
tional philosophy which would be intro- 
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TABLE 1 


SIGNIFICANCE OF DIFFERENCES BETWEEN 
MALE AND FEMALE ACHIEVERS AND 
UNDERACHIEVERS ON THE PINTNER 
GENERAL ABILITY TEST 




















Pintner Standard Score Means 
Sex 
Achievers Rd PF P ‘ P 
Male 81.53 | 80.14 |1.03 ns*|1.01) ns* 
Female | 81.56 | 80.65 |1.54/ ns* 63) ns* 





* Yields significance below the .05 level. 


duced by the inclusion of Ss who had 
moved from one school district to another. 

A single high school district with two 
large high schools whose combined enroll- 
ment was over six thousand was selected 
for use in the study. In addition to the 
factor of size and the presence of a fairly 
representative population from the socio- 
economic point of view, it was also im- 
portant to conduct the study in a school 
system where specific grades (A, B, C, 
etc.) were used at both the elementary 
and high school levels. 

One hundred sixty-eight students met 
all of the criteria for inclusion in the study. 
This group was divided further into four 
subgroups of male Achievers, male Under- 
achievers, female Achievers, and female 
Underachievers, for purposes of compar- 
son. Much research has shown the neces 
sity of treating males and females sepa- 
rately in studies of underachievement. In 
order not to obtain groups whose meal 
intelligence scores were not significantly 
different it was necessary to eliminate § 
males and 18 females from the sample. All 
of those eliminated were from the Achiever 
groups. The f and ¢ tests were used to in- 
sure that groups compared had both com- 
parable variances and means. These results 
are reported in Table 1. The final group: 
consisted of 36 male Achievers, 36 male 
Underachievers, 45 female achievers, and 
17 female Underachievers. 

Following the final selection of Ss, the 
academic record for each student from 
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TABLE 2 


SIGNIFICANCE OF DIFFERENCES BETWEEN 
Mean GrapE-Point AVERAGES OF 
MaLe ACHIEVERS AND UNDER- 
ACHIEVERS FROM GRADE ONE 
TurovucH ELEVEN 

















Mean Grade-Point Average 
Grade , 
Achievers Ball PiF t P 
' | | | 
ee AA |__| |_| — 
1 | 281 | 2.56 |1.97|ns***| 1.44\ns 
2 | 2.94 | 2.64 |1.94ins_ | 1.77ins 
3 | 3.03 | 2.58 |1.49)ns | 2.83).01* 
4 | 3.19 | 2.72 |1.08ins | 2.96|.01* 
5 | 3.28 | 2.75 |1.02ine | 3.71).01* 
6 | 3.33 | 2.67 |1.33ins | 4.46).01* 
7 | 3.25 | 2.56 |1.02ins | 5.80] .01* 
8 | 3.36 | 2.50 /1.59\ns | 6.23).01* 
9 | 3.25 | 2.14 |1.32ins [10.57 .01* 
10 | 3.13 | 1.87 |1.30\ns _/10.24).01* 
11 | 2.81 | 1.85 14.05) .01**| 5.46} .01* 





* Yields significance beyond the .01 level. 

** Yields significance beyond the .02 level but below 
the .01 level. 

*** No significance. 


Grades 1 through 11 was obtained. In the 
case of elementary school grades it was 
necessary to convert from letter to number 
grades. This was done on a four point scale 
to keep elementary grades comparable to 
high school grades. Thus, and A became 4.0, 
a B became 3.0, ete. Each S’s grade-point 
average for each grade (not a cumulative 
grade point average) was then computed. 
Mean grade-point averages for each group 
at each grade level were then computed. 
Male Achievers were then compared with 
male Underachievers and female Achievers 
with female Underachievers on the basis of 
grade-point average at each grade level by 
means of the f and ¢ tests. 


RESULTS 


The comparison of male Achievers and 
Underachievers indicates that a difference 
significant at the .01 level is found in the 
GPA of the two groups beginning at the 
third-grade level, and that this difference 
increases in significance at each grade level 
up to Grade 10, where it decreases some- 
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Fic. 1. Comparison of the achievement 
patterns of male achievers and underachiev- 
ers from Grade 1 through 11. 


what. It remains significant at the .01 level, 
however. A difference in grade-point av- 
erage in favor of the Achiever group ac- 
tually exists at Grade one and becomes 
larger at Grade two, but it is not signifi- 
cant at the .05 level of confidence in either 
of the first two grades. Results of the f 
and ¢ tests are summarized in Table 2. 

Graphic presentation renders these re- 
sults even more striking. Figure 1 indi- 
cates that while the general trend of grades 
in both groups tends to be the same, there 
is never any overlap. It also shows clearly 
the decline in mean difference between the 
two groups at the tenth- and eleventh- 
grade levels which is due primarily to a 
drop in the mean grade-point average of 
the Achievers, rather than a rise in the 
grades of the Underachievers. 

Comparison of the female Achievers and 
Underachievers presents quite a different 
picture from that seen in the male groups. 
Through Grade five the Underachievers 














106 


TABLE 3 
SIGNIFICANCE OF DIFFERENCES BETWEEN 
Mean GrRapDeE-PoInt AVERAGES OF 
FEMALE ACHIEVERS AND UNDER- 
ACHIEVERS FROM GRADE ONE 
THrRovuGH ELEVEN 



































Mean Grade-Point Average 
Grade ] 
Achievers | ,Under- | p | P | t 
Lael = 
1 | 2.93 | 3.06 |1.19)ns**| .65) ns 
2 | 3.02 | 3.12 |1.33)ns 53] ns 
3 2.96 | 3.24 |2.37ins |1.59| ns 
4 | 3.02 | 3.18 |1.07ins | .87/ ns 
5 3.13 3.18 /|1.30\ns -25) ns 
6 3.29 3.18 |1.08ins | .59) ns 
7 | 3.02 | 2.76 |1.08\ns |1.59| ns 
8 | 3.13 | 2.82 |1.56\ns 1.73 ns 
9 | 3.06 | 2.24 |1.40ins |6.46| .01* 
10 | 3.08 | 2.05 |2.22ins |8.66| .01* 
11 | 2.96 | 2.11 |1.33)ns 6.69) .01* 





* Yields significance beyond the .01 level. 
** No significance. 


actually exceed the Achievers in GPA, al- 
though not at a significant level of confi- 
dence. At Grade six the Achievers obtain 
a higher mean GPA for the first time, and 
from that point until Grade 10 this dif- 
ference increases every year, although it 
does not reach significance until Grade 
nine. From Grade 9 through 11 the dif- 
ference is significant at the .01 level. These 
results are summarized in Table 3. 

As in the case of the data on males, the 
data on females is most clearly understood 
through graphic presentation. Figure 2 
contrasts these two groups. As was the 
case with the male groups, there is again 
a tendency for the mean grade-point av- 
erages of the two groups to diminish 
slightly in the last year of high school, and 
again this can be accounted for by a drop 
in the grade point average of the Achiever 
group, rather than an increase in the 
Underachiever group. 

With regard to the male Underachievers 
it would appear reasonable to say that the 
predisposition to underachieve academi- 
cally is present when the Underachiever 
enters school. It is also safe to say that, in 
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Fic. 2. Comparison of the achievement 
patterns of female achievers and under- 
achievers from Grade 1 through 11. 


comparison to the Achiever controls, the 
problem becomes steadily more serious un- 
til Grade 10, at which time it becomes only 
slightly less serious, due primarily to a 
drop in grade-point average on the part of 
the Achiever group. 

Comparison of the female groups does 
not present nearly so clear cut a picture. 
As has been found in other studies of un- 
derachievement, there is a great deal of 
difference between what we find to be true 
of males, and what seems to be true in the 
ease of females. The present study pro- 
vides no clues which would explain why 
Underachieving females actually tend to 
do better than Achieving females in the 
first five grades, nor do we have any facts 
which would explain the precipitous drop 
they take, beginning in Grade six. The 
fact that actual underachievement among 
the female group does not show itself un- 
til Grade six does not completely rule out 
the possibility of the presence of a pre- 
disposing factor at the time the female 
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Underachiever enters school. The timing 
of the drop in GPA of female Under- 
achievers is just about right for the start 


| of puberty. We may hypothesize that fe- 
‘males do not display their self-directing 


tendencies to the same extent that males 
do until they approach adolescence. 

Another justifiable conclusion to be 
drawn is that underachievement is not a 
temporary phenomenon in the life of these 
Ss, but rather is chronic in nature. In com- 
parison to the control group, the male 
Underachievers have been obtaining grades 
below their ability level since Grade one. 
The female Underachievers have been 
performing below their ability level since 
Grade nine, and have tended to do so 
since Grade six. This finding lends weight 
to the previously stated hypothesis that 
academic underachievement is not an easily 
modifiable surface phenomenon. 

The most obvious implication of the 
study is the need for the early identifica- 
tion of Underachievers. At the present 
time, very little deliberate identification of 
such students is taking place. The work 
that is being done tends to be going on at 
the high school level. Comparison of the 
studies of Calhoun (1956) and Kirk (1952) 
at least suggests that while counseling with 
Underachievers may prove successful at all 
levels, it requires less time witfhi younger 
students. 

Much more intensive research than has 
yet been done needs to be done with the 
parents of Underachievers. McClelland et 
al. (1953) suggest that tk@ parents. of 
Underachievers do not at ye are level 
of performance from their children. The 
present study found very many more male 
than female Underachievers. This would 
suggest according to McClelland’s hypothe- 
sis, that a higher level of academic per- 
formance is demanded from females than 
from males. Observation would not appear 
to support this idea, but certainly it needs 
intensive study. 

What are the factors in the school situa- 
tion which tend to reinforce the Under- 


achievers predisposition to underachieve; 

and what are the conditions which might 

forestall its appearance? These too are 

important topics for further study. 
SuMMARY 

Groups of Achievers and Underachievers 
with 1Q’s over 110 grouped on the basis 
of sex were compared on the basis of 
grade-point average at every grade level 
from one through eleven. All subjects in 
the study were from a single school dis- 
trict and had gone all the way through 
school in that district. Subjects were clas- 
sified as Achievers or Underachievers on 
the basis of the cumulative grade-point 
average they earned in Grades 9, 10, and 
11. 

Results for males indicated that the 
Underachievers tended to receive grades 
lower than the Achievers beginning in 
Grade one, and that this difference became 
significant at the .01 level at Grade 3. 
From Grade 3 to Grade 10, the difference 
increased in significance every year. In 
Grades 10 and 11 the difference was re- 
duced somewhat, but remained significant 
at the .01 level. The decrease in these 
grades was due to a slight drop in the 
grades of the Achievers. 

Results for females indicated that fe- 
male Underachievers actually exceeded 
Achievers in grade-point average for the 
first five years of school, although not at a 
significant level of confidence. Beginning 
in Grade 6 Underachievers began a pre- 
cipitous drop in grade-point average and 
remained below the Achiever group from 
Grade 6 through Grade 11. The difference 
became significant at Grade 9. The dif- 
ference became slightly smaller at Grade 
11, due primarily to a drop in the grades 
of Achievers at that level. 
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A COMPARISON OF UNDER- AND OVERACHIEVING 
FEMALE COLLEGE STUDENTS 


MABEL K. M. LUM 


Cornell University 


That motivation is an important varia- 
ble in determining achievement in any field 
of endeavor is axiomatic. Yet research 
workers have found that the motivational 
factors associated with academic success 
have not yielded as readily to quantitative 
treatment as have the intellectual factors 
Part of the difficulty may arise from the 
fact that many of the instruments used in 
these studies were developed for use in 
other prediction problems, usually clinical. 
Further study of motivational variables in 
academic achievement appear to be justi- 
fied, using measures which are tailored to 
the academic situation. The general hy- 
pothesis proposed here is that students who 
underachieve differ significantly from those 
who overachieve in their motivation for 
studying and in their attitudes towards 
various aspects of the academic situation. 

The role of study habits in college 
achievement is another area which calls for 
more systematic investigation. Although 
this factor has been studied repeatedly 
during the past three decades, no definitive 
results have emerged (Gordon, 1941; Har- 
ris, 1931, 1940; Locke, 1940; Schultz & 
Green, 1953; Myers & Schultz, 1950). It 
was hypothesized in this study that under- 
achievers do not differ significantly from 
overachievers in the reported use of effec- 
tive study habits. 


PROCEDURE 


Subjects 


Three experimental groups were drawn 
from two classes in introductory psychol- 
ogy at the University of Hawaii. The selec- 
tion procedure was designed to yield three 
groups of 20 students each who would be 
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equated for scholastic aptitude, as meas- 
ured by the American Council on Educa- 
tion Psychological Examination (ACE), 
but who would differ widely in achieve- 
ment, as expressed in the cumulative grade- 
point ratio. It was also felt that as far as 
possible the three groups should be homo- 
geneous with respect to certain relevant 
variables, including sex, chronological age, 
and college class. The largest single group- 
ing which permitted both categorization 
into the experimental subgroups and also 
further matching on variables other than 
aptitude were female sophomore students. 

The correlation between ACE total score 
and freshman grade-point ratio was de- 
termined for all female sophomores for 
whom admissions data and grades were 
available. Freshman grade-points ratios 
were then predicted on the basis of a re- 
gression formula and those students whose 
obtained grade-point ratios fell below their 
predicted ratios by at least one-half SD 
were designated as underachievers (U). 
The overachievers (O) earned higher 
grades than would be expected on the basis 
of the predicted score, or at least one-half 
SD above the mean. A third group was 
composed of those students who received 
grades within one-half SD of their pre- 
dicted score and were designated as normal 
achievers (N). 

The experimental subgroups were fur- 
ther refined by matching for group means 
in ACE score and chronological age. Table 
1 presents a summary of the comparison 
of groups on selected variables. It will be 
seen that the combined groups do not differ 
on the basis of scholastic aptitude, but do 
differ significantly in academic average. 
Translating the mean grade-point ratios 
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TABLE 1 
CoMPARISON OF UNDER-, NORMAL-, AND 
OVERACHIEVEMENT GROUPS WITH RE- 
spect To ACE Torat Score, 
GrapDeE-Point RaTIo, AND 
CHRONOLOGICAL AGE 














| 

| ACE | Grade- Chrono- 

Total | Point | ogent 
Achievement N |Raw Score) Ratio ge 

Group | . ae ae 
M |S? | uM \sD| M |sD 

|_| | — —|— | — 
Under 20 | 99.3]16.0) 1.75 | .27| 18.8 | .43 
Normal 20 | 100.3/16.0) 2.14 | .25) 19.2 |1.15 
Over 20 99.8/18.8} 2.88 | .51) 19.1 70 
tun .20 4.88° 1.54 
tno - 09) 5.54° | 13) 
tvo 09) 8.54°) -43) 








* Significant beyond the .001 level 


for each group in terms of letter grades, 
the overachievers earned an approximate 
B average at the end of their freshman 
year. Normal achievers, on the other hand, 
earned only slightly better than a C av- 
erage, while the underachievers earned 
roughly a D plus average. A wider separa- 
tion between the under- and normal- 
achieving groups was not obtained, since 
at this university those students with less 
than a 1.5 ratio at the end of their fresh- 
man year are denied further registration. 
Nevertheless, the difference in average 
grades obtained by even these two groups 
is highly significant. 

The groups appeared to be roughly com- 
parable with respect to college representa- 
tion, with 75% of the total sample being 
enrolled in the College of Education. With 
respect to ethnic background, nearly 75% 
of the sample was of Japanese ancestry. 
The rest of the Ss represented Chinese, 
part-Hawaiian, Filipino, and mixed ethnic 
groups. 

All Ss were unmarried and in the first 
semester of their sophomore year at the 
time of testing. No S was included whose 
grade-point ratio was based on course work 
completed at another institution. 

The forcing of homogeneity on aptitude 
and other variables necessarily limited the 
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size of the experimental sample. In ad- 
dition, it was found necessary to restrict 
the sample to one sex when preliminary 
inspection of academic records revealed a 
significantly disproportionate number of 
males in the underachieving category, and 
females in the overachieving category. Con- 
sequently, the data were treated with 
caution and should be interpreted in view 
of the restrictions set upon this investiga- 
tion. 


Instruments 


An experimental form of the Survey of 
Study Habits and Attitudes (SSHA) by 
Brown and Holtzman was made available 
to this study by the authors of the in- 
ventory.’ With the exception of an ad- 
ditional 25 items and minor changes in 
the wording of certain of the original items, 
the recent form is identical with the stand- 
ardized form (Brown & Holtzman, 1953) 
in design and content. A distinct advantage 
of the revised form is that it permits a 
precise use of the SSHA through six spe- 
cially developed 16-item subscales. 

Six sentence stems suggesting achieve- 
ment motivation and attitudes toward the 
college situation were drawn from the Dole 
Vocational Sentence Completion Blank 
(Dole, 1952). The selected items were as 
follows: 


Success usually comes as a result of 
What I like least about school 

The thing I like best about school 

I hope that my education will 

If one fails in school 

In my case, success 


The incomplete sentences were admin- 
istered before the SSHA in order to assure 
that the Ss’ responses would reflect their 
own feelings rather than phrases or ideas 
suggested by the structured inventory. Be 
fore undertaking to analyze the content of 


? The writer wishes to express her apprecia- 
tion to William F. Brown and Wayne H 
Holtzman for making this form available for 
experimental purposes. 
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the sentence completion responses, code 
numbers were substituted on the protocols 
for Ss’ names. This was done in order to 
eliminate insofar as possible subjective bias 
in the content analysis procedure. Thus, 
at no time during the categorization of the 
data did the investigator know the classi- 
fication of the S whose responses were be- 
ing considered. Analysis of the protocols 
consisted of a classification of responses 
according to the categories provided by the 
Dole manual. 


RESULTS 


SSHA 


Table 2 shows the group means on the 
SSHA subscales and total score. Compari- 
sons among the three groups were made 
by means of t ratios for small samples. It 
will be noted that the underachievers 
did not differ significantly from normal 
achievers on total score, although the dif- 
ference is in the expected direction. On 
the other hand, overachievers differed sig- 
nificantly from both normal achievers (.05 
level) and underachievers (.001 level) on 
SSHA total score. 
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Statistically significant differences were 
obtained between over and underachievers 
on four of the six subscales. The greatest 
difference between the two groups ap- 
peared on Subscale AD, Achievement 
Drive. Two other subscales separating un- 
der- from overachievers at the .01 level 
of confidence were Procrastination Orien- 
tation and Self-confidence. Showing slightly 
less discrimination between Os and Us than 
the preceding subscales, but nevertheless 
reaching statistical significance (.05 level) 
was Subscale EP, Educational Philosophy. 

Subseale TV, Teacher Valuation, failed 
to distinguish between Os and Us, although 
the difference between means is in favor of 
the overachievers. 

Subscale SH, Study Habits, consistently 
failed to discriminate between the three 
groups, the means being nearly identical 
for all groups. It will be noted also that the 
group means were lowest on this scale. 
Sentence Completion 

Frequencies of response in each category 
for the various sentence stems were tallied 
for each group and then converted into per- 
centages. Significant differences were ob- 


TABLE 2 


ComPaRISON OF UNDER-, NORMAL-, AND OVERACHIEVEMENT Groups on SSHA 
SuBscALEsS AND Tota. Score 

















Group 
Under- Normal- . 
Subscale edblovens aa. » st é 
M SD M SD M SD U-N N-O U-O 

Study Habits 13.20 5.05 13.40 4.80 13.95 4.10 13 ® 51 
Educational Philos- 16.60 3.95 18.45 4.20 19.30 4.40 1.45 .63 2.05* 

ophy 
Teacher Valuation 19.05 3.10 18.95 4.35 20.60 3.35 .08 1.35 1.52 
Achievement Drive 14.05 5.90 17.05 5.10 19.25 4.65 1.73 1.43 3.10°° 
Procrastination Ori- 11.45 6.25 13.45 5.50 16.15 4.60 1.08 1.69 2.70* 

entation 
Self-confidence 12.60 6.55 14.85 6.25 16.75 4.25 1.13 1.13 3.37° 
Total score 23.60 8.91 28.00 8.62 33.10 5.93 1.59 2.18* 3.97*** 








* Significant at the .05 level. 
** Significant at the .01 level. 
*** Significant at the .001 level. 
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tained between Us and Os on four of the 
six stems, and between Ns and Os on two 
stems. None of the stems elicited differen- 
tiating responses between Us and Ns. The 
following trends were significant: Over- 
achievers (50%) more often than under- 
achievers (20%) cited examinations as the 
aspect of school least liked (0.5 level). 
Moreover, Os differed from both Us and 
Ns on what they hoped to get out of col- 
lege. Whereas 40% of the Os mentioned the 
desire for an increase in personal effec- 
tiveness, 75% of the Us and Ns, respec- 
tively, indicated general, unspecified ad- 
vantages (.01 level). Again, the stem “In 
my case, success ,” elicited responses 
which fell into two clearly differentiated 
categories. Eighty per cent of the Os gave 
personal definitions of success (65% and 
45% for Ns and Us, respectively), whereas 
the Us tended to give responses referring 
to the attainableness of success such as “is 
not in sight,” “is remote,” or “depends 
on my working harder” (.01 level). The 
percentages in this latter category for Us, 
Ns, and Os were 55%, 35%, and 20%, 
respectively. With regard to the aspect of 
school best liked, nearly three-fourths of 
the Ns mentioned social aspects, particu- 
larly the opportunity to form new friend- 
ships, while slightly more than half of the 
Us and less than one-half of the Os men- 
tioned this aspect (.05 level). With ref- 
erence to attitude toward success, 65% of 
the U responses indicated the conventional 
belief that success usually comes as a re- 
sult of “hard work” or compliance with 
obligations (.05 level). The corresponding 
percentages for Ns and Os were 55% and 
35%, respectively. The stem “If one fails 
in school ” elicited nearly identical pat- 
terns of response in the three groups. 


Discussion 


The results of this study offer support to 
the hypothesis that the difference between 
successful and less successful students of 


similar aptitude is primarily one of at- 
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titude and motivation, rather than of 
reported study habits. Not only were un- 
derachievers indistinguishable from over- 
achievers in their reported use of effective 
study procedures, but the obtained means 
indicate that even the better students do 
not use so-called “good” study habits with 
the consistency that how-to-study manuals 
would have them. 

Academic drive thus seems to differenti- 
ate the overachiever from the undera- 
chiever. The latter tends to become easily 
discouraged when confronted with long or 
difficult assignments and admits that un- 
less she likes a course, she exerts only the 
minimum effort required to get a passing 
grade. The successful student also 
shows a marked tendency toward procras- 
tination with regard to her assignments 
and tends to rely upon external pressures in 
order to complete her assignments. For 
this student, studying is somewhat of a 
random proposition, dependent upon “in- 
spired” moods. It is not surprising that she 
also tends to be more susceptible to dis- 
tracting influences than the better student 
and admits to wasting too much time en- 
gaging in social activities for the good of 
her studies. She is more critical of edu- 
cational methodology and more often ex- 
presses doubt as to the value of a college 
education than the overachiever. 

So far as this sampling is concerned, 
teacher valuation does not appear to be 
prominently related to scholastic success. 
The high group means on the subscale 
measuring these attitudes indicate a gen- 
erally favorable attitude regardless of 
achievement status. The fact that the Ss 
were predominantly of Oriental ancestry 
leads to some speculation as to the role 
of cultural factors in influencing the re- 
sponses to the scale. The premium placed 
on education by the Chinese and Japanese, 
along with the fact that within the Oriental 
cultures the scholar has traditionally held 
a position of high esteem, may account in 
part for the lack of discriminatory ability 
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of these items among the Ss in this study. 
However, the present sampling is too small 
to permit much more than speculation, 
and further research is required before any 
such generalizations can be made with 
certainty. 

It is of interest to note that one of the 
discriminating subscales appears on closer 
inspection to be concerned neither with 
study habits per se nor with attitudes to- 
ward studying. Rather, this scale (Self-con- 
fidence) appears to measure more nearly 
the student’s personal effectiveness in the 
classroom, or perhaps his emotional sta- 
bility. As such, it suggests that research- 
ers in this area might profitably delve fur- 
ther into the deeper layers of personality 
functioning in attempting to determine 
the reasons for academic success or failure. 
One might raise the basic question of what 
the psychodynamic functions of college 
are for these students. To what extent does 
the academic behavior of the underachiever 
reflect the quality of his relationship with 
his parents? What was the underlying 
motivation in attending college in the first 
place? 

Further support for the major hypoth- 
esis that attitudinal and motivational fac- 
tors are positively related to academic suc- 
cess is provided by the responses to the 
sentence completion task. The tendency of 
overachievers to express a desire for an 
increase in personal effectiveness through 
education parallels the finding of Burgess 
(1956) who reported that overachievers 
(male) revealed a greater need to improve 
the self or status than underachievers. The 
tendency for overachievers to cite exami- 
nations as the aspect of school least liked 
is not altogether surprising in view of the 
lact that academic success depends in large 
Measure upon performance in course ex- 
aminations, and that in order to maintain 
his achievement status, the high grade- 
point student must reckon with this in- 
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evitable feature of the academic situa- 
tion. A little anxiety may well be the sine 
qua non of academic success. 


SUMMARY 


An experimental form of the Brown- 
Holtzman Survey of Study Habits and 
Attitudes and a portion of the Dole Vo- 
cational Sentence Completion Blank were 
administered at the University of Hawaii 
to three groups of Ss (designated as under-, 
normal-, and overachievers) who were 
equated for scholastic aptitude and other 
pertinent variables, but who differed sig- 
nificantly in grades earned. Overachievers 
have stronger motivation for studying, 
tend to be more self-confident, and appear 
to have a greater capacity for working 
under pressure than underachievers. The 
latter students show a marked tendency 
to procrastinate and tend to rely upon 
external pressures to complete assign- 
ments. They are more critical of educa- 
tional methodology and philosophy than 
overachievers. Underachievers did not dif- 
fer from overachievers in their professed 
study habits. This may suggest to coun- 
selors and instructors of classes in “how- 
to-study” the importance of working on 
attitudes as well as on the mechanics and 
conditions of studying. 
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THE RELATIONSHIP OF DIFFERENCES BETWEEN VERBAL 
AND NONVERBAL INTELLIGENCE SCORES 
TO ACHIEVEMENT 
RICHARD O. GUNDERSEN 
Eastern State Hospital, Washington 


and LEONARD 8S. FELDT 
State University of Iowa 


During the past two decades, multi- 
score tests of intelligence and scholastic 
aptitude have become increasingly com- 
mon. The authors of these tests suggest 
that the several scores (or quotients) and 
the differences between them are more 
useful than a single score in diagnosing 
learning problems, planning differentiated 


‘instruction, and carrying out effective 


guidance. These claims are widely, al- 
though by no means universally, accepted 


| and acknowledged, despite the fact that 


the actual uses are often only vaguely de- 
fined and rarely substantiated. 

For the classroom teacher the multi- 
score test is claimed to be more useful be- 
cause “it gives a clearer picture of the 
proficiencies of each child” or “it helps the 
teacher better to understand each child’s 
unique pattern of abilities.” Such claims 
are difficult to validate directly, since it is 
almost impossible to isolate the specific 
educational benefits of such understanding. 
However, it seems probable that such ad- 
ditional information will be of value only 


if it has achievement correlates. If dif- 


ferences in profiles of achievement are not 
associated with differences in profiles of 
intelligence, it does not seem likely that 


| the “clearer picture” which the teacher ob- 


tains will facilitate the main objectives of 
elementary or secondary education. 
The simplest form of multi-score test 


vields two scores. Pattern information for 








such a test may be quite readily grasped, 
since it may be simply quantified by the 
difference between the scores. Two-score 
tests almost always include one score 
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based on verbal material; the second is 
usually based on quantitative or abstract 
symbolic material. It is with the verbal- 
symbolic type of test that this study was 
concerned. The general purpose of the 
investigation was to ascertain whether or 
not variations in the pattern of verbal and 
nonverbal intelligence test scores may have 
genuine utility for the classroom teacher. 
An attempt was made to answer the fol- 
lowing questions: What are the achieve- 
ment correlates, if any, to a marked supe- 
riority of one quotient over the other? Do 
the special talents of the child with a dif- 
ference in favor of nonverbal intelligence 
find an outlet in school activities? Are 
teachers aware of “brightness” when it is 
revealed in nonverbal rather than verbal 
activities? 

That many children have large true dif- 
ferences in verbal and nonverbal intelli- 
gence cannot be doubted. For example, 
the manual for the California Test of Men- 
tal Maturity and related materials (Sulli- 
van, 1957; Mendenhall, 1959) indicate 
that one child in four evidences a differ- 
ence of 16 points or more. More than one 
in 20 evidences a difference of 24 points 
or more. Differences of 15 points would, 
for a single individual, be statistically sig- 
nificant at the .05 level. Hence, most of 
the children having sizeable differences 
must be considered to have true, not 
chance, differences in their verbal and non- 
verbal intelligence scores. 

Few experimental studies have been re- 
ported bearing directly on the usefulness 
of the multiple-score feature to the class- 
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room teacher. No studies could be found, 
for example, to indicate that instruction 
proceeds more efficiently when teachers 
have available both verbal and nonverbal 
quotients rather than single quotients from 
an omnibus-type test. Neither have there 
been reported detailed studies clarifying 
and evaluating the specific techniques by 
which classroom teachers may use the mul- 
tiple-score information in the diagnosis 
and yemediation of learning problems. 
There is considerable evidence, however, 
that differences among quotients are re- 
lated to classroom performance. 

Bijou (1942), Stroud, Blommers, and 
Lauber (1957), Traxler (1958), and Hage 
and Stroud (1959) have pointed out that 
verbal tests are more closely related to 
language-centered achievement than are 
nonverbal tests. McLean (1954) and Ta- 
barlet (1958) have suggested that quite 
different achievement predictions would 
be made for subjects (Ss) who have mark- 
edly different patterns of verbal and non- 
verbal intelligence test scores. This would 
be true even though the Ss had the same 
average score on the two types of tests. 
When environmental factors militate 
against normal language development, 
children tend to score lower on verbal tests 
than on nonverbal or performance tests. 
This has been demonstrated by many in- 
vestigators (Altus, 1953; Kittell, 1959; 
Pintner, 1922; Seidl, 1937) for children 
reared in bilingual homes. Kittel (1959) 
and Seashore (1951) drew a similar con- 
clusion for children whose language devel- 
opment was inhibited by factors associated 
with low socio-economic status. 

These studies suggest that the difference 
between verbal and nonverbal scores is 
potentially useful to the classroom teacher. 
However, before this value is realized, 
teachers will need considerable guidance 
in its application to instructional prob- 
lems and its relation to performance in 
various school subjects. 


RICHARD GUNDERSEN AND LEONARD FELDT | 


PROCEDURE 


The basic procedure of this study | 
volved the comparison of test and class. | 
room behavior of groups of pupils having | 
specific patterns of verbal and nonverbal 
intelligence test scores. The test used to 
define the groups was the California Short- 
Form Test of Mental Maturity. The §s 
were drawn from an Iowa school system | 
which had, in September, 1958, 522 fourth- 
grade pupils in the public schools. The 
median IQ for these fourth-grade children 
on the California test was 111. Stroud | 
(1959), using the Lorge-Thorndike in| 
telligence Test, Verbal Section, with 5,343 
Iowa public school fifth-grade children, re- 
ported a median intelligence quotient of | 
109 for Iowa children enrolled in the regu- 
lar public elementary schools. Achieve- | 
ment in this school system, as measured | 
by the Iowa Tests of Basic Skills, was 
found to be average or slightly below ay- 
erage in all areas in 1959. These measures 
suggest that this school system is near av- 
erage for the state with regard to intelli- 
gence and achievement. 

The difference between the Language 
and Nonlanguage IQ’s on the California 
Short-Form Test was used to define four 
groups: (a) an Extreme Nonlanguage 
Group consisting of pupils whose non- 
language IQ was 24 or more points higher 
than their language IQ; (6) a Moderate 
Nonlanguage Group consisting of pupils | 
whose nonlanguage IQ was from 16 to 23 
points higher than their language IQ; (c) 
an Equal Group comprised of pupils whos | 
language and nonlanguage IQ’s differed | 
by 8 points or less, and (d) an Extreme | 
Language Group consisting of pupils whos 
language IQ was 24 or more points higher 
than their nonlanguage IQ. The group | 
with difference scores ranging from —S to | 
+8 served as the control group. 

Pupil records were examined to identify 
those children whose IQ divergencies fel 
in the categories noted above. After al 
such children were identified, 25 children 











were 
classi 
mate! 
The f 
total 
withi 
matcl 
the c: 
cedur 
stand 
were 

Th 
tered 
achie’ 
guage 
Past 
the I 
expec 
in th 
skills: 
skills. 
could 
those 
some’ 
work 
be ex 
levels 
ous a! 
skills 
sized 
more 
the n 
were 
equiv 
nique 
sions- 
and | 
relate 

Pu 
abiliti 
hot a 
such 
For e¢ 
ous ¢ 
ative 
ties a 
evalu: 
tionn: 








ving | 
Thal 
d to | 
or | 
e Ss 
ri 
irth- 
The 
dren 
roud 
in| 
1,343 
, Te- 
it of | 





ieve- | 


roup | 
8 to | 


atify 

fell 
> all 
dren 








VERBAL AND NONVERBAL INTELLIGENCE AND ACHIEVEMENT 


were selected from each of the four 
classifications. These four samples were 
matched, S by S, on the basis of total IQ. 
The four Ss representing each quartet had 
total IQ’s that were either identical or 
within two points of one another. Perfect 
matching was possible in about 90% of 
the cases. As a result of this matching pro- 
cedure, each group had the same mean and 
standard deviation for total IQ. These 
were 113 and 10 respectively. 

The Iowa Tests of Basic Skills, adminis- 
tered in January, 1959, were used to assess 
achievement in vocabulary, reading, lan- 
guage, work study skills, and arithmetic. 
Past research suggested that the pupils in 
the Extreme Language Group could be 
expected to achieve their highest scores 
in those areas most saturated in verbal 
skills: vocabulary, reading, and language 
skills. Conversely, the Nonlanguage Groups 
could be expected to do their best work in 
those areas in which verbal facility is 
somewhat less important; arithmetic and 
work study skills. The Equal Group could 
be expected to achieve at more consistent 
levels than the extreme groups in the vari- 
ous areas. Due to the importance of verbal 
skills in school work, it was also hypothe- 
sized that overall achievement would be 
more closely related to the language than 
the nonlanguage scores. These hypotheses 
were tested with respect to mean grade 
equivalents via analysis of variance tech- 
niques. The design involved two dimen- 
sions—IQ patterns and curricular areas— 
and utilized techniques appropriate for 
related measures. 

Pupils at the fourth-grade level develop 
abilities in a number of areas which are 
not assessed by an achievement battery 
such as the Iowa Tests of Basic Skills. 
For example, the student’s skills in vari- 
ous crafts, art work, oral language, cre- 
ative self-expression, and physical activi- 
ties are not sampled by these tests. To 
evaluate proficiency in these areas, a ques- 
tionnaire was administered to each child’s 
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teacher in an interview. It was assumed 
that if a child had unusual talents or weak- 
nesses not measured by objective tests, the 
child’s teacher would be qualified to evalu- 
ate them. Since the teacher interviews took 
place in the spring, the teachers had had 
sufficient time to observe the children and 
formulate judgments about their abilities. 
Teachers were not told to which IQ pattern 
group each child belonged. 

The questions asked of each teacher are 
listed below, followed by the categories in 
which the answers were tabulated. The 
questions were formulated to permit fairly 
short objective responses, although in the 
actual interview teachers often enlarged 
upon their answers. The replies to each 
question were summarized by IQ pattern 
groups and the data analyzed via chi 
square tests.’ 


1. In what half of his class is this child? 
(Upper or lower.) (Scored for agreement 
with Iowa Tests of Basic Skills.) 

2. Do you think his achievement is on a 
par with his intelligence? (Yes or no.) 

3. Have you any reason to think his ver- 
bal activities, e.g., reading, writing or speak- 
ing, give a misleading indication of his in- 
telligence? (Yes or no.) 

4. Does this child appear to like arts and 
crafts and working with his hands better 
than reading, arithmetic and other bookish 
activities? (Yes or no.) 

5. How bright is this child when it comes 
to tasks involving insight or reasoning? 
(Above average, average, or below average.) 

6. How well does this child put his 
thoughts into words? (Better than average, 
average, or poorer than average. 

7. How would this child’s oral language 
ability compare with his ability to express 
himself on paper? (Better oral, same, or 
worse oral.) 

8. Is this child one of the slowest or one 
of the fastest readers in his class? (Fastest 
10%, middle 80%, slowest 10%.) 

9. Does he have any special learning diffi- 
culties other than reading? (Yes or no.) 





* Because of the matching procedures that 
were employed, the groups were not wholly 
independent. However, the correlations 
among groups proved to be rather small and 
nonsignificant. Thus, it was concluded that 
the x’* test could be safely applied. 
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10. Does this pupil have any special skills 
—things he does somewhat better than other 
children? (Excels at intellectual tasks or ex- 
cels at physical tasks.) 

11. Is he better in reading or arithmetic? 
(Reading or arithmetic.) 

12. Which does he like more, arithmetic 
or reading? (Reading or arithmetic.) 

13. When he has free time in school, what 
kind of activity does he engage in? (Mental 
activity or physical activity.) 


RESULTS 


Achievement Tests. Mean grade equiva- 
lents for each group in the five major areas 
measured by the Iowa Tests of Basic Skills 
are presented in Table 1. In all areas of 
this achievement battery—Vocabulary, 
Reading, Language, Work Study Skills, 
and Arithmetic—the mean grade equiva- 
lents ranked the groups in the following 
order: Extreme Language, Equal, Moder- 
ate Nonlanguage, Extreme Nonlanguage. 
This ranking is identical to the ranking in 
language intelligence, and the exact reverse 
of the ranking in nonlanguage intelligence. 
In no area did a group of lower verbal 
intelligence exceed a group of higher verbal 
intelligence. 

The achievement test data were ana- 
lyzed via a two-factor analysis of variance. 
The interaction of IQ pattern groups and 
curricular areas, which reflects deviations 
from parallelism among the profiles of 
averages, was found to be significant at 
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the .05 level. The main effect of groups, 
which represents a test of the difference in 
composite achievement, was also significant 
at this level. 

In general, children who obtained the su- 
perior language intelligence scores evi- 
denced greatest strength in Reading, Vo- 
cabulary, and Language. Those children 
who obtained superior nonlanguage in- 
telligence scores did not do quite so well in 
Reading and Language as they did in 
Work Study, Arithmetic, and Vocabulary. 
Their performance in the latter area was 
quite surprising; it had been hypothesized 
that the Vocabulary performance of this 
group would be at least as low as their per- 
formance in Reading and Language. 

In view of the significant interaction, 
separate analyses were made of group dif- 
ferences within each of the individual 
achievement areas and among achievement 
areas within each group. These analyses 
indicated that in each area the IQ pat- 
tern groups differed significantly in aver- 
age achievement. Though the magnitudes 
of differences among the groups varied 
from area to area, superior achievement 
was consistently associated with superior 
language intelligence. These results were 
in keeping with the obvious importance of 
verbal abilities in all achievement areas. 

Teacher Questionnaire. Seven of the 
thirteen questions gave rise to significant 
differences (.05 level) in the distributions 
of replies for the four groups. These were 
Questions 1, 6, 8, 10, 11, 12, and 13. 

Question 1, pertaining to the student's 

ank in class, was scored as to agreement 
with the composite score on the Iowa Tests 
of Basic Skills. It was hypothesized that 
teachers might recognize in the Nonlan- 
guage Groups skills that are not measured 
by the achievement tests. As a_ result, 
these pupils might sometimes be ranked 
higher by their teacher than they were by 
the tests. This proved to be the case. For 
almost 100% of the Extreme Language 
Group, teacher judgment and test scores 
agreed. However, for about one-third of 
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the Extreme Nonlanguage Group, teacher 
judgment and test scores disagreed. Pre- 
sumably, for this one-third teachers were 
aware of and were considering factors not 
reflected by the test. Data for the inter- 
mediate groups were consistent with the 
trends set by the extremes. 

Question 6 concerned the child’s verbal 
fluency in oral language. A clear trend 
was revealed for nonfluency to increase 
from the Extreme Language to Extreme 
Nonlanguage Group. Only one child in the 
latter group was regarded to be above av- 
erage in his ability to put thoughts in 
words, while over half of the Extreme 
Language Group was considered to be 
above average. 

(*-estion 8 concerned the speed of read- 
ing of each child. A definite trend was in 
evidence that children with lower verbal 
intellegence were more often among the 
slowest readers in their classes. 

Question 10 was included in the ques- 
tionnaire to discover the ways in which 
the talents of the pupils with superior non- 
language intelligence are revealed. A vari- 
ety of responses was elicited to this ques- 
tion. In addition to activities involved in 
the basic skill subjects, teachers mentioned 
activities which demand unusual levels of 
manual dexterity, muscular coordination 
and artistic creativity. Nineteen of the 
50 Nonlanguage children and 25 of the 
50 Language children were thought to have 
special skills. For summary purposes, re- 
sponses were grouped into physical and 
intellectual categories. Of those children 
considered to have special skills, 8 from 
the Nonlanguage Groups and 21 from the 
Equal and Extreme Language Groups evi- 
denced superiority in intellectual areas. 
The more frequent recognition of unusual 
talent in the average or above-average lan- 
guage groups is not startling, since school 
work offers considerable opportunity for 
the exercise of verbal talents. That teach- 
ers could identify unusual talents in a sub- 
stantial proportion of the nonlanguage 
children is perhaps more surprising and 
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educationally encouraging. If at least mod- 
erate validity can be claimed for teacher 
judgments, the finding suggests that the 
child with superiority in favor of non- 
language intelligence may have highly de- 
veloped skills which the teacher can rec- 
ognize and exploit to the child’s advantage. 

Question 11 was intended to test teacher 
awareness of the pattern of pupil achieve- 
ment. The replies corroborated the achieve- 
ment test data concerning the compara- 
tive average achievement of each group in 
reading and arithmetic. In the teachers’ 
opinions, half of the Extreme Nonlanguage 
Group was better in reading, half was 
better in arithmetic. On the other hand, 
85% of the Extreme Language Group 
were placed at a higher level in reading 
and 15% at a higher level in arithmetie. 
Intermediate groups were consistent with 
this trend. 

Question 12 concerned pupil preferences 
for reading or arithmetic. The replies to 
this question were almost identical to the 
preceding one. They suggest that a pupil’s 
preference is consistent with his profi- 
ciency, or that teachers believe it is. 

Question 13 concerned the student’s 
choice of free time activity. Extreme Non- 
language children showed a much stronger 
preference for physical than mental ac- 
tivity. The Extreme Language group 
showed a comparably strong preference 
for mental activity. The intermediate 
groups showed progressively greater pref- 
erence for mental activity as the IQ dif- 
ference changed in the direction of verbal 
superiority. 

The fact that the other questions did not 
yield significantly different distributions of 
replies does not prove that differences be- 
tween the groups do not exist in the abili- 
ties being considered. The statistical tests 
may have had insufficient power to reveal 
true differences, or teachers may have had 
too little information to make valid judg- 
ments. Despite these possibilities, the ab- 
sence of significant differences on at least 
several of the questions seems worthy of 
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consideration. For example, fairly large 
group differences might have been expected 
in reply to the question: “Do you think 
his achievement is on a par with his in- 
telligence?” The four groups had, in fact, 
the same average intelligence quotient, yet 
they differed quite markedly in overall 
achievement. If teachers were aware of the 
intelligence test scores of these children, 
they might have considered the Extreme 
Language Group to be achieving at un- 
usually high levels and the Extreme Non- 
Language children to be achieving at un- 
usually low levels. Yet no such trend was 
apparent in the data. Significant differ- 
ences might also have been expected for 
Question 3: Have you any reason to think 
verbal activities give a misleading indica- 
tion of his intelligence? If teachers were 
acutely aware of the unique patterns of in- 
telligence characterizing these children, and 
if the average of the verbal and nonverbal 
quotients is taken as an overall measure 
of pupil brightness, the verbal proficiencies 
of the extreme groups should have ap- 
peared misleading. For the Extreme Non- 
language Group, performance in verbal ac- 
tivity might have suggested the children 
were duller than they really are; for the 
Extreme Language Group, verbal perform- 
ance might have suggested they were 
brighter than they really are. Again, teach- 
ers observed no such tendencies. Perhaps 
these teachers were unfamiliar with the re- 
corded quotients of the children. Perhaps, 
also, like most teachers, they depended 
more heavily upon achievement data than 
on intelligence test data for inferences 
about pupil intelligence. 

The fact that no significant differences 
were observed among the groups in their 
ability to handle tasks involving insight 
or reasoning would be of interest, if it 
were confirmed by objective data. It seems 
probable, however, that more reliable data 
would reveal group differences on such 
tasks. The mastery of arithmetic concepts 
and problems involves the use of insight 
and reasoning, and in this area the Lan- 
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guage Groups surpassed the Nonlanguage 
Groups. The failure of teachers to observe 
clear-cut differences probably reflects on 
the inadequacy of teacher judgment, but 
further study of this characteristic of the 
groups is needed. 

In general, the questionnaire data cor- 
roborated those obtained from the achieve- 
ment tests. There was evidence that teach- 
ers recognized unique talents in both the 
Nonlanguage and Language Groups. There 
was no evidence, however, that the differ- 
ence score can be used to direct teacher 
attention to instructional possibilities 
which might otherwise be ignored. Sys- 
tematic study of groups which are equal 
in verbal intelligence but markedly differ- 
ent in nonverbal intelligence would prob- 
ably be of considerable value. 


SuMMARY 


Four fourth-grade groups whose lan- 
guage and nonlanguage intelligence quo- 
tients differed in amounts ranging from 24 
or more points in favor of the language 
quotient to a similar discrepancy in favor 
of the nonlanguage quotient were com- 
pared in achievement. Data were gathered 
via standardized tests and teacher ob- 
servations. The groups, which were 
matched on total IQ, were found to differ 
significantly in their proficiency in read- 
ing, oral and written language, vocabulary, 
work study skills, and arithmetic. In each 
area the group with marked verbal s- 
periority ranked first, the group with 
marked nonverbal superiority ranked last. 
The largest differences among groups 0¢- 
curred in reading, language, and vocabv- 
lary, the smallest in work study skills and 
arithmetic. Teachers were more often 
aware of special talents in the Language 
pupils than in the Nonlanguage pupils. A 
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nite preference for free time activity of 4 
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THE RELATIONSHIP BETWEEN “CAUSAL” ORIENTATION, 
ANXIETY, AND INSECURITY IN ELEMENTARY 
SCHOOL CHILDREN 


ROLF E. MUUSS* 
Goucher College 


This paper presents the findings of a 
study designed to investigate the relation- 
ship between “causality” and such indices 
of mental health as anxiety and insecurity 
in fifth- and sixth-grade children. The con- 
cept, causality, constitutes the theoretical 
framework of the Preventive Psychiatry 
Program at the State University of Iowa. 
The Preventive Psychiatry Program is de- 
signed to investigate the extent to which 
causal orientation contributes to mental 
health, and whether and to what extent an 
experimental learning program emphasiz- 
ing the causal nature of human behavior 
produces causally oriented subjects. 

Causality is defined as an understanding 
and appreciation of the dynamic, complex, 
and interacting nature of the forces that 
operate in human behavior. It involves an 
attitude of flexibility, of seeing things 
from the view point of others as well as 
an awareness of the probabilistic nature of 
knowledge. A causally oriented person is 
capable of suspending judgment until suf- 
ficient factual information is available; 
furthermore, he realizes that his behavior 
has consequences and that there are al- 
ternative ways of solving social problems 
(Muuss, 1960). It is assumed that a per- 
son who is aware of the dynamic and causal 
nature of human behavior is better able to 
solve his own problems and to meet social 
situations. This study purports to investi- 
gate a limited aspect of this assumption. 

Explicitly stated, our hyoptheses are 
that Ss who are high causally oriented (as 


*This study was completed while the au- 
thor was with the Iowa Child Welfare Re- 
search Station, State University of Iowa. 
Appreciation is expressed to the Grant Foun- 
dation for their support of this research 
project. 


measured by paper-and-pencil tests) will 
differ from Ss who show a low degree of 
causal orientation on the following cri- 
terion variables: 

Hypothesis 1. They will demonstrate less 
insecurity as measured by the Kooker 
Security-Insecurity Seale (Kooker, 1954). 

Hypothesis 2. They will show less anxi- 
ety as measured by the Children’s Mani- 





fest Anxiety Scale (CMAS) (Castaneda, | 


McCandless, & Palermo, 1956). 

Hypothesis 3. They will make fewer L 
responses as measured by the L scale 
which is part of the CMAS. 

The rationale for these hypotheses is 
based on the assumption that a lack of 
insight into the dynamics of one’s own be- 
havior and an unwillingness and/or an in- 
ability to understand the problems and 
the behavior of others tends to increase 
the level of anxiety and the degree of in- 
security. If other people’s behavior is not 
understood it will tend to be threatening, 
as are physical events which an individual 
experiences and does not understand. A 
lack of insight into the dynamics of be- 
havior will tend to make it difficult to 
react logically to the behavior of others. 
Furthermore, if behavior is not understood, 
it may be misinterpreted and the individ- 
ual may react in such a way as to produce 
a threat to the other person’s security 
and self-respect. This then might tend to 
generate conflict which would add further 
to the difficulty of the situation. If at times 
a person does not understand his own be- 
havior and the factors that influence him 
he naturally feels threatened, insecure, 
and anxious. Once a person does under- 
stand himself and others he is more willing 
to agree with such statements as: he 
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sometimes gets angry, he does not always 
tell the truth, he does not like everyone, 
and similar items that make up the L 
scale. 

Hypothesis 4. Furthermore, it is hypoth- 
esized that these differences—if they have 
any generality at all—will be equally ob- 
vious for fifth- and sixth-grade Ss. 

Hypothesis 5. The two measures used, 
insecurity scores and anxiety scores, will 
show a positive relationship to one an- 
other. 

Hypothesis 6. Since about half of the Ss 
have been exposed to an experimental 
learning program and half have served as 
controls in another research study, it is 
hypothesized that a significantly larger 
portion of the high causally oriented Ss 
came from the experimental classes and 
that a larger portion of the low causally 
oriented Ss came from control classes. 


PROCEDURE 


In order to investigate the above stated 
hypotheses, two sets of tests were ad- 
ministered to 280 sixth- and 179 fifth-grade 
Ss in the schools of a midwestern com- 
munity of 80,000. 224 of these Ss had been 
exposed to an experimental learning pro- 
gram designed to develop a causal orienta- 
tion, while 235 Ss came from regular class- 
rooms and served as controls in another 
research study. The fact that the Ss came 
from both experimental and control classes 
was only utilized in testing the sixth of the 
previously stated hypotheses. However, 
there is some justification in utilizing Ss 
from two different groups since there is 
evidence (Stiles, 1950) that children from 
regular classrooms do not have much of an 
understanding and appreciation of the 
causal nature of human behavior. There- 
fore, in order to obtain a wide range of 
causal orientation, the experimental and 
the control Ss were combined so that dif- 
ferences between high and low causally 
oriented Ss could be studied more effec- 
tively. 

The following two tests served as se- 
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lection criteria to determine high causally 
oriented and low causally oriented Ss: 

1. The Social Causal Test* 

2. The Physical Causal Test* 

Of these two tests only the Physical Causal 
Test had a significant correlation (r = .58, 
N 251) with IQ. This might explain that 
the high causal group differed signifi- 
cantly from the low causal group in terms 
of IQ scores. However, this is not a serious 
shortcoming since it will be shown that the 
criterion variables, insecurity, anxiety, and 
L scores, have a relatively low relation- 
ship to IQ. The high causally oriented Ss, 
for the purpose of this study, are defined 
as all those Ss who fell above the corre- 
sponding grade mean on both the above 
described selection criteria, low causally 
oriented Ss as those Ss who fell below the 
corresponding group mean on both selec- 
tion criteria. 

The second set of tests, constituting the 
criterion variables, consisted of: 

1. The Kooker Security-Insecurity Scale. 
Kooker ratings are obtained by a trained 
observer who follows the Ss during the 
whole school day for a period of five days, 
and who rates the child on a series of 19 
behavior items indicative of security or 
insecurity as to frequency of occurrence. 
When several observers independently 
rated the same Ss, between rater correla- 
tions ranged from .63 to .86 (Kooker, 
1954). 

2. The Children’s Manifest Anxiety 
Scale (CMAS) (Castaneda, McCandless, 
& Palermo, 1956). 

3. The 11 item L scale which is ad- 
ministered interspersed with the CMAS. 


THe DEsIGN OF THE Stupy 


From the 280 sixth-grade Ss 90 met the 
selection criteria and were classified in the 


* The Social Causal Test is described else- 
where in the literature (Lyle & Levitt, 1955) 
(Ojemann, Levitt, & Whiteside, 1955). 

* The Physical Causal Test was developed 
in part by the author. Part of the test was 
modified by the author for use with fifth- 
and sixth-grade Ss from Clark (1953). 
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TABLE 1 
DISTRIBUTION OF ORIGINAL AND CORRECTED 
NUMBER OF CASES IN THE FACTORIAL 
ANALYSIS OF VARIANCE 








Corrected* 


Original N Original N 











N in Order 
for the for the . : 
Security Anxiety and a — 
Scores L Scores Pit 
Grade Level y 
Degree of Degree of Degree of 
Causality Causality Causality 
High Low High Low High Low 
5th grade 59 41 58 45 60 45 
6th grade 89 72 89 71 88 66 





* This corrected distribution was used for all three 
sets of scores: anxiety, security, and L scores (see text). 


high causal group; 72 fell below both 
group means on the selection criteria and 
were classified as the low causal group. 
Similarly, from the 179 fifth-grade Ss, 59 
met the criteria for the high causal group, 
and 45 fell in the category low causal 
group. In order to test for the effect of 
grade level (G) (fifth or sixth grades) and 
the degree of causality (C) (High and 
Low as defined above) an analysis of vari- 
ance was computed for each of the three 
criterion variables, insecurity, anxiety, and 
L scores. A factorial analysis of variance 
design was utilized for all three sets of 
scores. Since this design requires either an 
equal number of cases in each cell or pro- 
portionality of cases from column to col- 
umn or from row to row (Lindquist, 1953), 
the data had to be adjusted. Proportion- 
ality could be obtained with the least loss 
of cases by utilizing the corrected distri- 
bution shown in Table 1. Proportionality 
of cases—as illustrated in Table 1 under 
Corrected N—was obtained by either ran- 
domly omitting cases from the original N, 
or by adding cases made up of the corre- 
sponding cell group mean of the original 
N. The original N’s for the security and 
the anxiety scores are also presented in 
Table 1. 


RESULTS 


The analysis of variance data are re- 
ported in Table 2. Inspection of Table 2 


reveals that there is a highly significant 
(p < .001) difference in the predicted di- 
rection between the high causally oriented 
and the low causally oriented Ss on the se- 
curity variable. Table 3 contains the means 
and standard deviations. The summary ta- 
ble (Table 2) also shows a significant dif- 
ference (p < .025) between fifth- and 
sixth-grade Ss. It is interesting to observe, 
however, that the fifth-grade Ss appear to 
be more secure (M = 9.16) than the sixth- 
grade Ss (M = 10.90). One might specu- 
late as to whether or not the onset of 
pubescence in sixth-grade Ss contributes to 
this unexpected finding or whether the 
anticipated change to Junior High School 
is a factor contributing to insecurity. The 
grade by causality interaction effect is 
nonsignificant. Independent ¢ tests for 
fifth- and sixth-grade Ss (Table 3) result 
in significant differences for both groups. 
The data in Table 3 further show that 


TABLE 2 


SuMMARY OF ANALYSIS OF VARIANCE OF 
INSECURITY, ANXIETY AND 

















L Scores 
(N 259) 
) St 
Types of Scores and df Mean | 
Source of Variance Squares ? 
a —|— 
Insecurity Scores 
Grades (G) 1 | 187.76 | 5.25 |<.025 
Causality (C) 1 753.19 |21.04 | <-001 
G X C Interaction 1 50.89 | 1.42 NS 
Within cells (w) 250° | 35.79 
Anxiety Scores | 
Grades (G) 1 | 233.05 | 5.62 |<.025 
Causality (C) 1 862.22 (20.80 |<.00! 
G X C Interaction 1 -93 .02 | NS 
Within cells (w) 253°*) 41.45 
L Scores | 
Grades (G) 1 02 004, NS 
Causality (C) 1 | 132.20 |26.73 | <.001 





G X C Interaction 1 | 6.57 | 1.33 | NS 
Within cells (w) 253") 4.95 | 

* Five degrees of freedom deducted from df for error 
since five mean values were added in order to obtain 
proportionality. 

** Two degrees of freedom deducted from df for error 
since two mean values were added in order to obtaid 
proportionality. 
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homogeneity of variance may be assumed. 
Thus, we feel justified in concluding that 
high causally oriented Ss are more secure 
than low causally oriented Ss as measured 
by the Kooker Security-Insecurity Scale. 

The analysis of variance summary re- 
lating to the second hypothesis—that high 
causally oriented Ss have less anxiety than 
low causally oriented Ss—is also reported 
in Table 2. There is a significant (p < 
001) difference between the high causally 
and the low eausally oriented groups on 
the anxiety variable. As Table 3 indicates 
the difference is in the predicted direction, 
high causally oriented Ss manifest less 
anxiety as measured by the CMAS than 
low causally oriented Ss. Again there is a 
significant difference (p < .025) between 
fifth- and sixth-grade Ss. However, the 
trend on the anxiety scale is in the opposite 
direction from that on the insecurity meas- 
ure. Sixth graders are more insecure than 
fifth graders, while sixth graders are less 
anxious than fifth graders. This finding 
throws some doubt on the fifth hypothesis 
which stated that anxiety and insecurity 
are positively related variables. The grade 
by causality interaction effect of CMAS 
scores is nonsignificant. Independent t 
tests for fifth- and sixth-grade Ss demon- 
strate significant differences (p < .005) 
for both classes. Tests for homogeneity of 
variance show that the assumption of 
homogeneity of variance is justified. In 
conclusion, the evidence supports the hy- 
pothesis that high causally oriented Ss are 
less anxious as measured by the CMAS 
than low causally oriented Ss. 

The third hypothesis predicted that 
there would be a relationship between cau- 
sality and L responses on the L scale. The 
summary Table 2 demonstrates that there 
is a significant difference (p < .001) be- 
tween the high causal and the low causal 
groups on the L variable. The score on the 
L seale is obtained by counting the L re- 
sponses. Thus, a low L score might be in- 
terpreted as honesty, a high L score as 
faking, dishonesty, or as lacking self-in- 
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TABLE 3 


COMPARISON OF THE MEAN SCORES OF THE 
HiecH anp Low CAUSALLY ORIENTED 
GROUPS ON THE KooKER SEcuRITY- 
INsEcuRITY Ratina ScALg, 
THE MANIFEST ANXIETY 
SCALE AND THE L ScaLE 





High Causal | 
Group 


Low Causal 
Group 


] | 





|N | M |SD| N| M@ |SD 
ef @ f 
| 

60} 8.15|5.26| 45/10.51|5.86|2.11° 


‘ 


Security Scores* 








5th-grade Ss | 
6th-grade Ss 88) 9.10/5.22) 66/13.29)7.27/3.94°* 
Total 14 


8) 8.72)5.22) 111/12.16 6.81/4.42°* 


1 | tt | 
r} | 


| 
Anxiety Scores” | | 
11.63|5.65| 45)15.47/6.23/3.21°* 








5th-grade Ss 60 

6th-grade Ss 88} 9.81/5.94) 66/13.39/7.67|3.13°* 
Total 148) 10.55/5.86) A14}14.28/7.19 4.42°° 

| | 
L Scores® | | 

Sth-grade Ss 60} 2.10/1.96] 45) 3.93)/2.44/4.09%° 
6th-grade Ss | 88} 2.40|2.25] 66] 3.58/2.22/3.22°* 
Total | 148) 


2.28 2.13) 111| 3.72|2.305.14°* 
| | 





*® Low Kooker scores imply greater security. 
> High scores reflect high anxiety. 

© High scores indicate many L responses. 

* Significant at the .05 level. 

** Significant at the .005 level. 


sight. Inspection of the mean (Table 3 
demonstrates that the differences are in 
the predicted direction. The mean L score 
for the high causal group is 2.28, for the 
low causal group it is 3.72. There is no 
significant difference between fifth and 
sixth grades and there is no significant in- 
teraction effect between grade level and 
causality. Independent ¢ tests for fifth and 
sixth grades demonstrate significant dif- 
ferences between the high and the low cau- 
sal group. Homogeneity of variance may 
be assumed since the variance ratio is non- 
significant. Thus, there is support for the 
hypothesis that high causally oriented Ss 
give fewer L responses, as measured by 
the 11 item L scale, than low causally 
oriented Ss. 

The F ratios which show the interaction 
effect of grades by causality in Table 2 
are nonsignificant and thus support the 
fourth hypothesis. This is equally true for 
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TABLE 4 


INTERCORRELATIONS OF THE CRITERION 
VARIABLES AND IQ For FirrH-GRaDE 
(N 137) anp SrxtH-GrapDE 
Sussects (N 232) 











Security Anxiety L Scores 

IQ 

5th grade —.30** .09 — .40** 

6th grade —.28** —.20** —.13* 
Security 

5th grade — .002 .10 

6th grade _ .13* .07 
Anxiety 

5th grade — — ll 

6th grade ~ — .13* 





* Significant at the .05 level 
** Significant at the .01 level. 


all three variables: security, anxiety, and 
L score. This finding and the ¢ values re- 
ported in Table 3 provide justification for 
accepting the hypothesis that in this study 
high causally and low causally oriented Ss 
respond differently to such indices of men- 
tal health as insecurity, anxiety, and L 
responses, irrespective of whether the Ss 
come from the fifth or the sixth grade. 

Table 4 reports the intercorrelations of 
the three criterion variables, as well as the 
correlation between IQ and each of the 
variables separately for fifth- and sixth- 
grade Ss. 

There is only a small positive correlation 
of .13 between anxiety scores and inse- 
curity scores for sixth-grade Ss, which 
barely reaches the .05 level of significance. 
The magnitude of the correlation does not 
change when analyzed separately for boys 
(r = .13, N 120) and girls (r = .16, N 
112), but with smaller N’s the correlations 
are no longer significant. The correlation 
between anxiety scores and security scores 
for fifth-grade Ss is, for all intent and pur- 
pose, zero. Thus, we do not have conclusive 
evidence that our measures of anxiety and 
security are positively related. The data 
might indicate that both tests measure 
different and unrelated aspects or traits. 


MUUSS 


However, we might also consider an al- 
ternative explanation arising from the dif- 
ferent nature of the two test instruments, 
namely, that the anxiety scale is a paper- 
and-pencil test, consequently acquiescence, 
faking, rationalizing, and a deliberate at- 
tempt to give the socially desirable re- 
sponse might operate in a systematic fash- 
ion, while the Kooker scale uses the rating 
of behavior by trained observers. Possible 
shortcomings of this scale such as rater 
bias and halo effect would be operative in 
a different fashion. Bruce (1958) using the 
same tests with sixth-grade Ss obtained 
a correlation of .26 (N 184) significant at 
the .01 level, thus giving some indirect sup- 
port for our hypothesis. Furthermore, his 
finding is basically in agreement with our 
data obtained for sixth-grade Ss. All that 
can be said at this point is that there are 
sufficient indications to warrant further 
investigation of the relationship between 
anxiety and insecurity as stated in our 
fifth hypothesis. 

The correlation between insecurity 
scores and L scores is nonsignificant for 
both grade levels. The correlation between 
anxiety scores and L scores is nonsignifi- 
cant for fifth graders and barely reaches 
the .05 level of significance for sixth grad- 
ers. If the sixth-grade data are broken 
down into sexes the correlations are for 
boys —.11 (N 120) and for girls —.21 (N 
112), both nonsignificant. Thus, the corre- 
lations of the anxiety scores with the L 
scores are not in disagreement with those 
reported by Castaneda et al. (1956) who 
found for sixth-grade boys r = —.10 (N 
65) and for girls r = .22 (N 49), both non- 
significant, even though he reports a posi- 
tive correlation for girls, while ours is neg- 
ative. 

As was indicated previously, the rela- 
tionship between IQ and the criterion 
variables is relatively low and not always 
consistent for fifth- and sixth-grade Ss. 
The correlation of the Kooker Security- 
Insecurity Rating Scale and IQ is —.28 
for sixth- and —.30 for fifth-grade Ss, both 
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significant at the .01 level. Thus, there is 
a tendency for the more intelligent child 
to be rated as more secure. However, a 
correlation of .30 accounts for only 9% of 
the variance. 

The correlation between anxiety scores 
and IQ is —.20 (significant) for sixth 
graders and +.09 (nonsignificant) for fifth 
graders, thus not only small in actual size 
but operating in different directions for 
the different grade levels. McCandless and 
Castaneda (1956) report correlations be- 
tween IQ and anxiety scores for sixth- 
grade boys —.16 (N 55), nonsignificant, 
and for sixth-grade girls —.43 (N 45), sig- 
nificant. 

The L scores show a moderately high 
significant correlation (r = —.40, N 137) 
with IQ for fifth-grade Ss, but only a 
barely significant correlation (r = —.13, 
N 232) for sixth-grade Ss. Again we feel 
justified in concluding that only a small 
amount of the variance is attributable to 
differences in IQ’s. For a correlation of 
40 only 16% of the variance can be ac- 
counted for in terms of IQ. However, there 
is a small tendency for the less intelligent 
children to give more L responses than the 
more intelligent Ss. 

In order to further eliminate the influ- 
ence of IQ on the data, Ss from the high 
causal group were matched with Ss from 
the low causal group on the IQ variable. 
Sixth-grade Ss were paired with sixth 
graders and fifth graders with fifth-grade 
Ss. The IQ score difference between each 
pair was never greater than plus minus one 
IQ point. The data in Table 5 are based 
on 31 matched pairs of sixth graders and 
19 matched pairs of fifth graders. The ¢ 
test between the high causal and low cau- 
sal Ss matched on IQ is computed by way 
of the standard error of a difference be- 
tween correlated means. The data indicate 
that IQ has no effect on the security data. 
The difference between the means is signifi- 
cant at the .005 level. The group means 
for the high and low causal Ss are almost 
identical for the total 259 Ss (Table 3) 
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with the group means for the 100 Ss who 
are used in the pairing (Table 5). Intelli- 
gence scores do not influence the results 
obtained in this study with the Kooker 
Security-Insecurity Scale. 

For the anxiety data—even though the 
correlations between IQ and anxiety scores 
are lower and less consistent in magnitude 
and in direction for fifth- and sixth-grade 
Ss than for the security scores (Table 4)— 
the t ratio between matched groups is not 
significant, but approaches the .05 level of 
significance. However, the correlation be- 
tween IQ and anxiety scores for the 99 
paired Ss only is —.13, not significant. 
Thus, even though IQ scores appear to 
have some influence on the obtained dif- 
ferences on the anxiety data, this influ- 
ence appears to be negligible. 

The difference between the high causal 
and low causal Ss matched on IQ for the 
L scores is significant at the .02 level. The 
L score means for the matched groups 
(Table 5) are about the same as the means 
for the total groups (Table 3). 


TABLE 5 


CoMPARISON OF THE MEAN SCORES ON THE 
Kooxer Securiry-Insecurity Scae, 
THE MANIFEST ANXIETY SCALE AND 
THE L ScaALE For CAUSALLY 
AND Low CAUSALLY 
ORIENTED SUBJECTS 
with Matcuep IQ’s 


(N 100) 








| . ae 
| a - yy | Low Causal Subjects 





| | | | | | 
| M | SD|N*| M |sp| *& 
| | | | 





Security | 50 | 8.60) 5.51 | 49 | 12.35/7.85) 3.05*** 
Scores | | | | | 

Anxiety 50 | 11.62) 6.19 | 49 | 14.06/8.04) 1.79° 
Scores 

L Scores 50 | 2.48) 2.17 | 49 | 


3.59/2.38) 2.55°* 
t 





® The security score for one child, the anxiety and L 
scores for another child were not available. 

> Since individuals are matched on the basis of IQ 
the standard error of a difference was computed for cor- 
related means. 

* Significant at the .10 level. 

** Significant at the .02 level. 

*** Significant at the .005 level. 
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We thus can conclude that the results 
obtained in this study cannot be explained 
on the basis of differences in intelligence 
which existed between the original high 
causal and low causal groups. 

So far the discussion has dealt only with 
the relationship between the degree of 
causality and indices of mental health, such 
as insecurity, anxiety, and L responses. No 
attention has been directed to the effects 
of a causal learning program in producing 
causally oriented Ss. In line with the sixth 
hypothesis one might ask: To what extent 
did the experimental classes contribute 
subjects to the high and the low causally 
oriented groups? As noted in Table 6 chi 
square analysis indicates for both fifth and 
sixth grades a preponderance of cases from 
the experimental (causal) classes in the 
High Causal Cell and a preponderance of 
cases from the control (regular) classes in 
the Low Regular Cell, far beyond the num- 
ber of expected frequencies. 

Bruce (1958) demonstrated that sixth- 
grade children who had participated for two 
consecutive years in an experimental learn- 
ing program designed to develop a causal 
understanding were more secure and less 
anxious than control Ss who had not par- 


TABLE 6 
A CoMPARISON OF FIFTH-AND S1xTH-GRADE 
SuBJEcTS FROM CAUSAL AND REGULAR 
CLASSROOMS WITH RESPECT TO THE 
SELECTION VARIABLE 

















- | 2 | 
_— C) as 
| 3 2 |3/_3| 
= e oja 
5 “* |e&|o « 
|—| | 
Fifth Grade | 
High Group /44 (29.2)*| 15 (29.8)) 59) 
Low Group | 8 (22.8) | 38 (23.2)) 46 
Total ” | 53 | 105/31.65°| <.001 
Sixth Grade> | 


| 
High Group [61 (46.5)*| 32 (46.5)! 93 
Low Group [23 (37.5) | 52 (37.5)| 75 
Total 84 84 168|18.88° |< .001 


| 





* Figures in parentheses are expected frequencies. 

> Reproduced with permission from Child Develop- 
ment Publications from a previous publication of the 
author (Muuss, in press). 

© Chi Square with Yates correction for continuity. 


ticipated in such a program. He used the 
same tests which were used in this study 
but did not report the L scores. It is in- 
teresting to observe that he obtained sig- 
nificant differences only between the two- 
year group and the control group, but not 
between the one-year group and the con- 
trol group. In this study a comparison was 
made between high causally and low cau- 
sally oriented Ss—a method more sensitive 
to differences than the comparison of ex- 
perimental and control groups, but yield- 
ing basically the same results. 


SUMMARY 


Two tests, the Social Causal Test and 
the Physical Causal Test, served as selec- 
tion criteria for high causally (NV = 148) 
and low causally (NV = 111) oriented Ss. 
A comparison was made between the high 
causal and the low causal fifth- and sixth- 
grade groups in respect to measures of se- 
curity, anxiety, and L responses. Data 
were analyzed by a factorial analysis of 
variance design followed by ¢t tests. The 
findings of the study may be summarized 
as follows: 

1. Fifth- and sixth-grade Ss who are high 
causally oriented respond to measures of 
security, anxiety, and L responses in the 
direction that might be considered as in- 
dicative of mental health. The high cau- 
sally oriented subjects show more security, 
less anxiety, and give fewer L responses 
than low causally oriented Ss. 

2. The obtained differences are equally 
obvious for the fifth- (V = 105) as well 
as for the sixth-grade Ss (N = 154). 

3. The differences between grade levels 
are inconsistent: fifth graders are more 
secure, sixth graders are less anxious. 
There is no between grade level difference 
on the L scale. 

4. For sixth-grade Ss there is no corre- 
lation, for fifth-grade Ss there is a very 
low but significant correlation between in- 
security and anxiety. 

5. Even though there are small but sig- 
nificant relationships between intelligence 
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and the criterion variables, the obtained 
difference between the high causal and low 
causal groups on the criterion variables 
cannot be explained on the basis of differ- 
ences in intelligence. 

6. The experimental classes designed to 
develop a causal understanding of the dy- 
namics of human behavior contribute sig- 
nificantly more high causally oriented Ss, 
while the regular control classes contribute 
more low causally oriented Ss to this study. 
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PREDICTION OF COLLEGE SUCCESS FROM ELEMENTARY 
AND SECONDARY SCHOOL PERFORMANCE! 


DALE P. SCANNELL’ 


State University of Iowa 


The identification of students who will 
succeed in and profit from higher edu- 
cation is of great interest in present Ameri- 
can education. Several approaches may be 
and are taken to obtain information pre- 
dictive of academic success. Advocates of 
one approach insist that predictor instru- 
ments, in addition to yielding accurate 
predictions, should represent an accept- 
able definition of the academic goals of 
public education. This approach recog- 
nizes that scholarship and admission ex- 
aminations in part define the activities 
of elementary and secondary schools and 
influence students’ attitudes toward what 
is necessary to succeed. The recommended 
instruments are so constructed that while 
preparing for the criterion examination, 
students are developing simultaneously 
toward the desired ultimate goals of the 
educational program. Adherents to this 
school of thought also maintain that the 
examination should yield a profile of stu- 
dent accomplishments that can be directly 
related to various academic proficiencies. 

The results of the Iowa Testing Pro- 
grams provide an opportunity to study 
the predictive power of instruments such 
as those suggested above (State Univer. 
Iowa, 1949). Comparable forms of the 
Iowa Tests of Basic Skills (ITBS) have 
been administered yearly since 1939 to 
a large proportion of Iowa pupils in 
Grades 3 through 8, and comparable forms 
of the Iowa Tests of Educational Develop- 
ment (ITED) have been administered 


* This article is based on part of the au- 
thor’s doctoral dissertation at The State Uni- 
versity of Iowa under the direction of A. N. 
Hieronymus and D. W. Norton. The author 
wishes to acknowledge the financial aid of 
The Grant Foundation. 

* Now at the University of Kansas, Law- 
rence. 
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yearly to the majority of Iowa high school 
students since 1942. 


PURPOSE 


The major purpose of this study was 
to investigate annually obtained compar- 
able achievement measures as predictors 
of college success. In addition, the pre- 
dictive power of measures of school at- 
tainment was studied using these meas- 
ures separately and in combination with 
achievement test scores. 


PROCEDURE 


A base sample was obtained of 3202 
students who had taken the ITED as high 
school seniors during the years 1948 to 
1952 and who enrolled the following fall in 
either the State University of Iowa (SUI) 
or Iowa State College (ISC). A sample 
this large was obtained to provide flex- 
bility in establishing subsamples of stu- 
dents not having data common to all 
students in the base sample. That is, not 
all students in the base sample had 
achievement test data for each of the grade 
levels included in the study; thus, a large 
base sample was necessary for obtaining 
subsamples of adequate size for studying 
the predictive power of achievement tests 
taken at various grade levels. 

One of the criterion institutions is 4 
liberal arts college in a state university | 
and the other an A & M college (during 
the years considered in this study). The 
admissions policies of the two institutions 
were essentially the same during the 1949 
to 1953 period; students were admitted 








if they had graduated from an accredited 
high school or if they could demonstrate 
competency on an entrance examination. 
If the same prediction equations are satis- 
factory for use at both schools, support 
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will have been derived for generalizing 
the results to other relatively unselected 
four-year institutions. 

The data that were collected, when 
available, included the following: results 
of the ITBS for Grades 4, 6, and 8; results 
of the ITED for Grades 9 through 12; 
rank in high school graduating class; high 
school grade-point average (GPA); fresh- 
man college GPA; four-year college GPA 
for graduates; and cumulative college GPA 
for students withdrawing from college. 


Prediction of General College Success 


The criteria used for the prediction of 
general college success were freshman GPA 
and four-year cumulative GPA for gradu- 
ates. Prediction equations for freshman 
GPA were obtained separately for SUI 
and ISC, for males and females, for 
students from three high school size 
groups, and for the total sample. Pre- 
diction equations for four-year GPA were 
obtained separately for SUI, ISC, and the 
total sample of graduates. The inde- 
pendent variables employed were the 
Grade 12 ITED, rank in graduating class, 
and high school GPA. 


Prediction of Freshman GPA 


Correlations with freshman college GPA 
are summarized in Table 1. Many previous 
investigators (Durflinger, 1943; State Uni- 
ver. Iowa, 1940) have reported rank-in- 
class as the best predictor of college suc- 
cess, but in this study high school GPA 
consistently yields higher zero-order cor- 
relations. Correlations for ITED and high 
school GPA fluctuate only slightly from 
sample to sample while the predictive 
power of rank-in-class is markedly lower 
for the ISC and small high school samples. 
These two samples overlap since many 
ISC students are from small high schools. 
The relatively lower predictive power of 
rank-in-class is likely due to its question- 
able meaning when small graduating 
classes are involved. GPA’s provide at 
least a rough indication of level of achieve- 
ment, and are undoubtedly more stable 
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TABLE 1 
CORRELATIONS WITH FRESHMAN COLLEGE 
GrabE-Point AVERAGE 











| old v] 
2 |Ssalgsa| = | a 
e | |See) See) O | 
5 OR) EO) wm mi « 
A z2|/— Ip | & Fo a 
bet A AS Be 
Total |s002 634 | 600 | 669 | .554| .713 
Isc }1695| .644 | .612 | .672 | .484 | 
SUI 1507} .629 | .608 | .673 | .640 | 
Females 11063] .690 | .673 | .683 | .527 | .750 
Males j2139| .613 | .589 | .648 | .543 7 
Large Schools |2373| .637 | .612 | .698 | .594 | .729 
(above 200) | 
Medium Size HS | 450) .634 | .501 | .632 | .504 | .680 
(101-200) 


Small Size HS 379| .638 | .501 | .633 | .404 | .687 
(1-100) 


| | 
| | 


* Multiple correlation of ITED unwtd C, HS GPA, 
and HS PR-in-class. 


from school to school than is rank-in- 
class. 

It should be noted that the correlations 
for females are substantially higher than 
the corresponding correlations for males. 
This is consistent with the results of 
other reported research (Durflinger, 1943; 
Jackson, 1955). 

The regression weights and zero-order 
correlations for the ITED subtests based 
on the total sample are presented in Table 
2. Test 5, Ability to Interpret Reading 
Materials in the Social Sciences, yields the 
largest zero-order correlation, .547, and 
Test 3, Correctness and Appropriateness 
of Expression, has the largest regression 
weight in the prediction equation. 

When the ITED were used as the in- 
dependent variables, the prediction equa- 
tions for the SUI, ISC, and total samples 
were very similar. To investigate the pos- 
sibility of using one equation for both 
institutions, the regression weights for the 
total sample were applied separately to 
the SUI and ISC data to obtain new 
multiple correlations. Similarly, SUI 
weights were applied to the ISC data and 
ISC weights were applied to the SUI data. 
A comparison of the obtained multiple 
correlation indices is presented in Table 
3. 








132 


TABLE 2 
PREDICTION OF FRESHMAN GPA FOR THE 
Torta SaMpLe From Grape 12 ITED* 
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SuBTESTS 
: ir with| | 
Independent Variables; Cri-| 8B b ‘* 
|terion | 
ITED 
Test 1 Soe 504 | .14878 021104 | 5.28 
Test 2 Nat S .418 | .00906} .001326 | 44 
Test 3 Exp .541 | .30365) .048323 | 12.10 
Test 4Q 471 | 14258] .017367 | 8.69 
Test 5 Soc St R .547 | .13066) .017923 | 4.48 
Test 6NSR .531 | .08038} .010187 | 2.54 
Test 7 Lit 483 | 00084) .001372 | 34 
Test 8G V .467 |—.03846|—.009884 | —1.98 
#* (.05) = 1.96 a = .01285 
df = 3193 








*iThe titles of the ITED subtests are as follows: 

Test 1-Understanding of Basic Social Concepts 

Test 2-General Background in the Natural Sciences 

Test 3-Correctness and Appropriateness of Expres- 
sion 

Test 4-Ability to Do Quantitative Thinking 

Test 5-Ability to Interpret Reading Materials in the 
Social Studies 

Test 6-Ability to Interpret Reading Materials in the 
Natural Sciences 

Test 7-Ability to Interpret Literary Materials 

Test 8-General Vocabulary. 


TABLE 3 
CoRRELATIONS BETWEEN THE GRADE 12 
ITED anp FresHman GPA UsInG 
Various REGRESSIONS WEIGHTS 








Sample Used to Determine 
Regression Weights 








Data Sample for Sample for pth 
his Other Combined 
School School 
ISC Sample .644 .636 .642 
SUI Sample .629 .622 .627 





The use of the total sample weights 
resulted in small decreases, not statisti- 
cally significant, in the correlations. The 
use of weights from one school with data 
from the other resulted in statistically 
significant decreases in the regression sum 
of squares, but the decreases in the cor- 
relation indices were negligible. Some re- 
duction, even for correlations from “own 
school” weights, would be expected in 
validation. These results indicate 


cross 


that one prediction equation can be ap- 
plied satisfactorily to students attending 
either of these schools. 

In order to study the relationship of 
prediction efficiency to grade of testing, 
the ITED for each of the four high 
school years and ITBS for Grades 4, 6, 
and 8 were correlated with freshman GPA. 
The results are presented in Table 4. It 
can be noted that correlations gradually 
increase as testing is done later in school. 

It should be noted that the nature of the 
base sample necessarily restricted the 
range of scores on the predictor instru- 
ments. To estimate the correlations that 
would have resulted from a representative 
sample of elementary and secondary stu- 
dents, some of the obtained correlations 
were corrected for the restriction in range 
of talent. Standard deviations on the pre- 
dictor instruments were obtained for the 
samples actually used and for the respec- 
tive norms samples, and the obtained cor- 
relations were corrected accordingly. The 
corrected correlations are also presented 
in Table 4. 

The corrected correlation for Grade 8 
test scores is higher than that for Grade 
12. During the years included in this 
study, many students left school during 
high school, and as a result, the Grade 8 


TABLE 4 


CorreELaTions oF ITED anp ITBS wits 
FRESHMAN CoLLEGE GPA, ACTUAL AND 

















CoRRECTED 
b c Y 4 
Test N | SD*| ¢ R _ (Cor- 
(Norms)} rected) 
| 
Grade: 
12ITED [3202 5.28) .609 | .634| 5.80) .71 
11ITED 2044) 4.87) .607 | .637 | 
10ITED (2170) 4.61) .568 | .588 
9ITED [1234] 4.47) .542 | .567 
8 ITBS 1076] 9.02) .484 | .490| 15.2 85 
6ITBS | 772) 11.93) .494 | .506 
4ITBS | 581) 9.10] .420 | .450| 10.3 60 





® Standard deviation of composite scores on predictor 
test (ITBS and ITED use different scales). 

> Unweighted composite. 

© Weighted composite (least squares weights derived 
separately for each sample). 
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sample is undoubtedly less selected than 


| the Grade 12 sample. These correlations 


suggest that elementary school test scores 
can provide useful predictions of college 
success. The predictive power of tests is 
usually considered only for actual college 
entrants. However, estimated correlations 


| for samples comparable to norm groups 





are of use to public school counselors and 
administrators who are in a position to 
suggest or implement differential programs 
for their students. 

A sample (NV = 931) was obtained of 
students whose ITED composite scores for 
all high school years were available. The 
four unweighted composite scores were 
used as independent variables for the pre- 
diction of freshman GPA. The zero-order 
correlation of the Grade 12 composite with 
the criterion was .622 for this sample. 
Adding ITED results for the other three 
years increased the correlation only to 
629, 

For 229 students Grades 4 and 8 ITBS 
composites and Grades 9 and 12 ITED 
composites were available. The zero-order 
correlation of the Grade 12 ITED com- 
posite with the criterion was .633 for 
this sample. The addition of the other 
composites raised the multiple correlation 
only to .642. 

Prediction of Four-Year Cumulative 
College Grade-Point Average 


Separate samples were established of 
college graduates for whom ITED results 
were available for each of the high school 
years. The obtained correlations are pre- 
sented in Table 5. The correlations with 
four-year average, though lower than those 
for freshman average, actually represent 
amilar prediction efficiency when reduction 
of range through dropouts is considered. 
The standard deviations of the Grade 12 
ITED composites for the norms, fresh- 
man, and graduating samples are 5.80, 
528, and 4.94, respectively. When the 
correlation with four-year average is cor- 
tected for variability of scores, the esti- 
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TABLE 5 


CorRELATION oF ITED wits Four-YEAR 
_ COLLEGE GPA FOR Tota SAMPLE 











Test N r R 
Grade: 
12 ITED 1426 .523 .535 
11 ITED 935 .537 .553 
10 ITED 994 501 .512 
9 ITED 552 .488 .507 
TABLE 6 


CoRRELATIONS OF GrRapDE 12 ITED Un- 
WEIGHTED CoMPOSITE, RANK-IN-CLASS, 
AND Hies Scuoot GPA wits 
Four-Year GPA 

















l l l nied 
| | Grade | : 
Sample N rtd d 1, A.A R 
Isc | 777 | .467| me | . 
Sui | 649 | 584 “600 | 
| | 
Total | 14 523 | .391 | .587 | 
mated correlation is approximately .69 


for a sample comparable in variability to 
the norms sample. 

The correlations of Grade 12 ITED 
unweighted composite, rank-in-class, and 
high school GPA with four-year average 
are presented in Table 6. Several similari- 
ties can be noted between the predictions 
of freshman average and four-year aver- 
age. Again, high school GPA yields the 
most accurate predictions in all samples, 
and rank-in-class yields a low correlation 
for the ISC sample. The multiple correla- 
tion of these three predictors for the total 
sample is .634. The regression weight for 
rank-in-class is not statistically significant. 

For 425 college graduates for whom 
ITED results were available for all four 
high school years, the Grade 12 ITED 
composite yielded the highest zero-order 
correlation, .548. Adding the other three 
composites raised R only to .555. 


Prediction of Graduation-Elimination 


The purpose of this part of the study 
was to investigate the accuracy with 
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which unsuccessful college students could 
be identified before matriculation. After 
eliminating from consideration students 
(N = 1306) who withdrew from college 
with satisfactory records or who were still 
in school, the sample was dichotomized; 
one group consisted of graduates (V = 
1426), and the other consisted of elimi- 
nees (N = 470). Eliminees were defined 
as students who left school while their 
grades were not satisfactory in terms of 
school requirements. The Grade 12 ITED 
were used as independent variables for 
separate prediction problems with the 
SUI, ISC and total samples. The obtained 
biserial correlations were .532, .512, and 
525 respectively. In all three equations, 
Test 3, Correctness and Appropriateness 
of Expression, had the largest regression 
weight. 

High school GPA, rank-in-class, and the 
Grade 12 ITED composite were combined 
in a multiple correlation problem for pre- 
dicting graduation-elimination with the 
total sample. The obtained correlation 
index was .604. Of these three variables, 
high school GPA had the highest correla- 
tion with the criterion, the index being 
577 compared to .532 in the instance or 
rank-in-class, and 472 for ITED com- 


posite. 
SUMMARY 


Various measures of achievement and 
attainment were correlated with fresh- 
man GPA, four-year cumulative college 
GPA, and graduation elimination for 
samples of students entering Iowa State 
College and the State University of lowa 
during the period 1949-1953, inclusive. The 
major findings include: 

1. The accuracy with which general 
college academic success was predicted 
from achievement test scores increased 
year by year from Grade 4 through high 
school; the Grade 12 ITED yielded mul- 
tiple correlations of .634 with freshman 
eollege GPA and .535 with four-year GPA. 
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2. Combinations of achievement. test 
data obtained at several points in stvu- 
dents’ careers were only slightly more 
predictive than the most recent results, 

3. High school GPA was the best single 
predictor of college success yielding corre- 
lations of .67 and .59 with freshman and 
four-year GPA, respectively. Rank-in- 
class was not highly predictive for gradu- 
ates of small high schools. 

4. ITED prediction equations derived 
for one college were only slightly less 
accurate than “own school” equations 
when applied to data for the other col- 
lege. The slight decreases in prediction 
accuracy suggest that these equations 
could be used satisfactorily at other four 
year institutions with similar admissions 
policies. 

5. When restriction in range of scores is 
considered, elementary school test data 
correlate highly with college success. The 
estimated correlation between Grade 8 
ITBS and freshman GPA for a sample 
representative of eighth grade students 
was .85. This finding suggests that pre- 
dictions of college success from elemen- 
tary school test scores can be as useful as 
predictions from high school data. 
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EFFECTS OF THREE VARIABLES IN A TEACHING MACHINE 
JOHN E. COULSON ann HARRY F. SILBERMAN 


System Development Corporation 


Recently a number of automated teach- 
ing devices, or “teaching machines,” have 
, been developed to assist human instructors 
in the fields of education and training. A 
“teaching machine” may be defined as a 
device which (a) presents a series of prob- 
lem materials (questions) to a student; 
(b) requires the student to respond to each 
question by some overt behavior; and (c) 
provides the student with knowledge of 
results of his behavior, usually immediately 
following each response. 

The purpose of the present study is to 
investigate three variables related to the 
design of teaching machines: 

1. Student Response Mode. The multi- 
ple choice and constructed response’ modes 
of operation are compared for teaching 
effectiveness. Considerable discussion of the 
relative merits of these modes can be found 
in the literature (Pressey, 1959; Skinner, 
1958). 

2. Size of Item Step. Most investigators 
in the field of automated teaching agree 
that optimal learning occurs when items 
presented to the student give the student 
a large degree of support (Glaser & 
Homme, 1958; Porter, 1958; Skinner, 
1958). According to these investigators, 
the items should be worded and sequenced 
in such a way that the student finds it 
simple to proceed from one item to the 
next, and thus receives a large percentage 
of positive reinforcements. In the present 
study, an attempt was made to evaluate 
experimentally the extent of the purported 
superiority of the small step condition. 

3. Item Sequence Control. The great 
majority of existing teaching machines 
present subject material items in an essen- 


*The term “constructed response” refers 
here to the procedure in which the student 
| Must complete statements by filling in (con- 
| structing) one or more missing words. 
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tially predetermined sequence. In the pres- 
ent study a predetermined item sequence 
was compared with a more adaptive pro- 
cedure (called “branching”) in which a 
student who had already learned a concept 
was allowed to skip certain other items 
covering the same concept. 


METHOD 


The experiment consisted of three 
phases: (a) A training session for each of 
80 experimental Ss with a ‘teaching ma- 
chine’ operating under one of eight differ- 
ent teaching procedures being compared; 
(6) A criterion test given to each S imme- 
diately following the training session; and 
(c) The same criterion test, given ap- 
proximately three weeks after the training 
sessions to the 80 Ss and also to a control 
group of 104 students who were in the 
same classes with the Ss, but who did not 
receive teaching machine training. 

Although the teaching machine training 
session for any single experimental S re- 
quired less than two hours, the sessions for 
all Ss extended over a period of one week. 
During this week, the training sessions re- 
placed regular classroom studies for the 
experimental Ss, while the control group 
continued to receive its normal instruc- 
tion. Subject material for the classroom 
instruction during this week, and up to 
the final administration of the criterion 
test, was not closely related to the material 
taught with the teaching machines. 

The purpose of the control group was 
to provide a comparison between experi- 
mental Ss and other students who did not 
receive any type of training on the par- 
ticular concepts covered by the Ss in their 
teaching machine training. From such a 
comparison it is possible to determine 
whether significant learning has occurred 
in the experimental group as a whole. 
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The following experimental performance 
measures were recorded for each S, re- 
gardless of mode of instruction: (a) time 
required to finish training; (6b) score on 
multiple choice portion of criterion test; 
(c) score on constructed response portion 
of criterion test. 

A psychology pretest administered prior 
to the experiment provided a measure of 
individual differences in initial achieve- 
ment among the students at the start of 
the experiment. This pretest was given as 
part of the regular course work to all 
students in the psychology classes from 
which Ss and controls were drawn, and 
was not closely related to the experimental 
materials. Pretest items were concerned 
primarily with the history of psychology. 


Apparatus 


Six identical sets of equipment were used 
in the experimental training sessions. The 
sets constituted manually controlled teach- 
ing machines, with human experimenters 
used in place of automatic control mecha- 
nisms. Each set consisted of a wooden 
screen, a number of pushbuttons and lights 
used for communication between experi- 
menter and S, a deck of 5 x 8 cards con- 
taining instructional items, and three 
sheets (panels) with information relating 
to the instructional items. The wooden 
screen had a window in its center in which 
the E placed cards so the instructional 
items could be seen by the S. Information 
sheets were attached to the S’s side of the 
screen during experimental training ses- 
sions, and could be used by the S as an aid 
in answering the instructional item; Ss 
could not use the sheets while taking the 
criterion quiz, however. 

Each of the six teaching machines could 
be operated under any one of eight teach- 
ing procedures. These eight teaching pro- 
cedures represent the eight combinations 
of three experimental variables, each vari- 
able having two possible values as follows: 
(a) response mode (multiple-choice ver- 
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sus constructed response); (b) size of 
steps between successive items to be taught 
(small steps versus large steps); and (c) 
type of item sequence control (branching 
versus nonbranching). 


Procedures 


Ss and Es were randomly assigned to 


the eight experimental training conditions, | 


using random number tables, with the re- 
striction that 10 Ss were trained under 
each condition. Zs were reassigned after 
each session, so that no E was restricted 
to a single experimental condition. 

Printed instructions were read aloud to 
each S before the start of the training ses- 
sion. These instructions described the task 
to be performed by the S, and informed 
him that both accuracy and speed of learn- 
ing were important. 

Constructed Response versus Multiple- 
Choice Operation. In the constructed re- 
sponse mode of operation the £ placed in 
the window of the screen a card with a 
constructed response item containing one 
or more blank spaces to be filled in by the 
S. After the S wrote his answer on a slip 
of paper, the EZ revealed to the S the cor- 
rect answer to the item. The S compared 
his own answer against the correct answer, 
and signalled by pushbutton whether or 
not he got the item correct.? The E£ re- 
moved this card and went on to the next 
card, continuing until he had finished the 
deck. He then repeated the entire process, 
but this time used only the items missed 
by the S on the first trial. The procedure 
was continued through successive trials 
until the S had answered every item cor- 
rectly. 

In the multiple-choice mode the S used 


*An informal check was made to deter 
mine the dependability of the Se’ evalua- 


tion of their own responses. Several ral- | 


domly selected Ss were observed during 
training, and after training their slips were 
compared with the correct answers. Results 
of the check indicated that Ss were generally 
accurate in their evaluations. 
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one of five pushbuttons to indicate his an- 
swer to the multiple choice items pre- 
sented to him. In addition, there were two 
pushbuttons used by the £ to tell the S 
whether he had selected a correct or incor- 
rect response. The S was required to con- 
tinue to respond to an item until he had 
selected the correct answer, upon which 
the E replaced that item with the next in 
the deck. After he had completed the first 
trial (e.g., the first run through the deck), 
the Z continued with additional trials as 
required, each time using only the cards 
that the S missed in his first response on 
the preceding trial. 

Small Steps versus Large Steps. Some Ss 
were trained with instructional decks con- 
taining many (104) items. These decks 
were called Small Step decks, because they 
supposedly require little effort on the part 
of the S to answer a particular item once 
he has gone through the preceding items. 
Other Ss were trained with Large Step 
decks containing fewer (56) items, and 
ostensibly requiring more effort to answer 
successive items. 

From the standpoint of the operation of 
the teaching machines, there were no pro- 
cedural differences between Small Step 
decks and Large Step decks, except in the 
number of items presented to the S. 

Branching versus Nonbranching. This 
variable determined the sequence in which 
items were presented to the S. In the Non- 
branching mode, the E simply went 
through all the cards of the deck in order. 
In the Branching mode, certain items were 
removed from the deck if the S answered 
certain other items correctly on the first 
try. The items to be removed were taken 
out of the deck by a second £, so they 
were never presented to the S. Thus, in 
the Branching mode, the exact number 
and sequence of items were not fixed, but 
depended on the performance of the S. 
A more complete description of the criteria 
for omitting items is given in the follow- 
ing section on instructional materials. 
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Instructional Materials 

The eight item sets (decks) and the 
three instructional sheets used in the ex- 
periment were based on a portion of a 
college course in elementary psychology 
used at Harvard University. The first item 
set obtained was the constructed response, 
Small Step deck, without branching. This 
set consisted of the first 104 items from a 
larger series of items developed at the 
Harvard University Psychological Labora- 
tory. The remaining seven sets represent 
combinations of the following transforma- 
tions of the original items: 

1. A number of possible alternative an- 
swers were provided for each item, to 
make the Multiple-Choice decks. 

2. Certain items were eliminated from 
the original set of 104 items, to make the 
Large Step decks of 56 items. The items 
eliminated were judged to be largely re- 
dundant, in the sense that they related to 
concepts already covered by other items, 
and added little extra information. 

3. To make the Branching decks, coded 
instructions for the branching procedure 
were marked on the backs of certain cards. 
These instructions indicated which cards 
were to be eliminated from the deck dur- 
ing the experiment if the S correctly an- 
swered the marked cards. For convenience, 
a marked card will be referred to as a 
Branch card; the cards to be eliminated 
when a Branch card is correctly answered 
will be called Conditional Skip cards. In 
general, a Branch card was the first of 
several items in the deck covering a par- 
ticular concept, while the associated Con- 
ditional Skip cards consisted of the re- 
maining items covering essentially the 
same concept. Small Step branching decks 
included 13 Branch cards and 39 Condi- 
tional Skip cards, while Large Step 
branching decks included 12 Branch cards 
and 22 Conditional Skip cards. 

The three instructional sheets, or “pan- 
els” used in the study covered most of the 
major concepts presented in the instruc- 
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tional items, but in the form of definitions 
and brief descriptive materials. 


Criterion Test 


The criterion test consisted of 36 ques- 
tions, of which 19 were constructed re- 
sponse (fill-in) and 17 were multiple- 
choice. All of the questions were based 
upon the material contained in the origi- 
nal (constructed response; small step; 
without branching) set of instructional 
items. The person who prepared the cri- 
terion items had not seen the other seven 
item sets prepared for instruction; con- 
versely, the persons preparing the alter- 
nate item sets had not seen the criterion 
questions. 

None of the criterion questions were 
duplicates of instructional items, though 
many of the same words were used as cor- 
rect responses. Most of the criterion ques- 
tions were of the application type, in which 
a situation is presented and the respondent 
identifies the principle involved or at- 
tempts to explain or predict an outcome. 

Reliability of the criterion test, as esti- 


JOHN COULSON AND HARRY SILBERMAN 


mated by the application of Kuder-Rich- 
ardson Formula 20 to the scores of the 
experimental Ss, was as follows: (a) whole 
test, .89; (6) constructed response por- 
tion, .85; and (c) multiple-choice portion, 


.79. Test-retest correlations for experimen- , 


tal Ss were .81 for the whole test, .79 for 
the constructed response portion, and 58 
for the multiple choice portion. 


Experimental Subjects 


The 80 experimental Ss and the 104 
members of the control group were taken 
from beginning psychology classes in Santa 
Monica City College. They had been in 
the psychology course for about one 
month, using F. Ruch’s Psychology and 
Life as a text and covering some general 
background in the area of psychology, but 
had not been given specific instruction in 
any material closely related to the subject 
material used in the experiment. 


RESULTS 


1. Table 1 provides a comparison of 
mean scores and standard deviations for 


TABLE 1 


CoMPARISON oF ScorES FOR EXPERIMENTAL AND CONTROL GROUPS ON 
PRETEST AND CRITERION TESTS 





























PEroup Comseal exe Experimental a Control 
Source of Scores P 
Mean SD Mean SD t df P 
Pretest 45.51 | 8.00 | 44.41 | 7.71 | 1.00] 182 | Not signifi- 
cant 
Criterion test: | Constructed | 13.50 | 3.92 | 7.53 | 3.62 | 11.06 | 195 01 
response 
First admin. | Multiple 11.62 | 2.50} 9.44 | 2.36 | 6.06 | 195 01 
choice 
Total crite- | 25.12 | 5.93 | 16.97 | 5.23 | 9.99 | 195 01 
rion 
Criterion test: | Constructed | 13.40 | 3.39 Not Not 
response | administered applicable 
Second ad- | Multiple- 12.40 | 2.52 
min. choice 
Total crite- | 25.80 | 5.30 
rion 
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experimental Ss and controls on the psy- 
chology pretest. No significant difference 
was obtained between the 80 Ss and the 
104 controls. The maximum score possible 
on the pretest was 61. 

2. A comparison of mean scores and 
standard deviations for experimental Ss 
and controls on the posttraining criterion 
test is shown in Table 1. This criterion test 
consisted of two subtests, one with multi- 
ple choice items, the other with con- 
structed response items. Maximum pos- 
sible scores were 19 and 17 for the 
constructed response and multiple choice 
subtests respectively. Table 1 shows that 
the experimental group was superior to 
the control group on the total criterion 
and on both the multiple choice and con- 
structed response subtests. Differences 
were significant at the .01 level. 

3. The criterion test was readministered 
to the experimental Ss three weeks after 
the first administration as a measure of 
retention. No significant difference was 
obtained on the total criterion (multiple- 
choice plus constructed response), when 
the mean of the first administration was 
compared with the mean of the second 
administration. On the multiple choice 
criterion subtest, however, retest scores 
were significantly higher (.01 level) than 
original test scores. . 

The evidence from these first three anal- 
yses indicates that the automated teaching 
materials used with the experimental Ss 
resulted in significant learning of the con- 
cepts taught and that this learning is re- 
tained at least for a three week period. 
These findings do not mean that the ex- 
perimental Ss exceeded groups receiving 
conventional classroom instruction on the 
same materials, since the control group 
was not taught the same concepts as the 
experimental group. 

4. Whereas Table 1 concerns differences 
in performance between the control group 
and the experimental Ss as a whole, the 
remainder of the tables presented in this 

| Teport relate to differences among the 
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various experimental treatments. Table 2 


shows the results of covariance analyses of 
the criterion test scores for the eight ex- 
perimental groups, with pretest scores used 
as a control variable. The desirability of 
using the pretest control is indicated by 
the correlations between pretest and post- 
test scores, which were as follows: (a) 411 
between pretest and constructed response 
criterion scores, (6) 404 between pretest 
and multiple choice scores, and (c) .442 
between pretest and total criterion. 

(a) Constructed Response. Criterion 
Subtest. The analysis of covariance repre- 
sented in Table 2 shows a significant main 
effect for the size of step factor (.05 level 
of significance), in the direction of higher 
scores for small step trainees. A significant 
interaction was also found between the 
mode of response and branching variables 
(.05 level). The assumption of homogene- 
ity of regression was tested and found to 
be satisfied. 

Table 3 shows adjusted mean con- 
structed response criterion scores and 
standard deviations for comparison of the 
three experimental variables. It also shows 
the adjusted mean scores and standard 
deviations for the four combinations of 
response mode and branching procedure, 
since this interaction was found significant 
in the covariance analysis. 

(b) Multiple-Choice Criterion Subtest. 
Covariance analysis yielded no significant 
main effects or interactions among the ex- 
perimental groups on the multiple choice 
subtest. 

(c) Total Criterion Test. No significant 
main effects or interactions were obtained 
among the experimental groups, as deter- 
mined by covariance analysis. 

5. An analysis of variance of differences 
among the eight experimental groups with 
respect to the amount of time taken to 
complete training was also performed. All 
three main effects are significant at the .01 
level. The constructed response training 
condition took more time than the multi- 
ple choice condition; the small step condi- 
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TABLE 2 


CoVARIANCE ANALYSIS OF CRITERION Scores USING A 
PsycHoLoGy PRETEsT as A CONTROL 
































(N = 80) 
Degrees of freedom ms | 
Source of Variation 38 | an § 32 | non & 3g an | & 
cB | Se /|i8| | Se | 2 | &8 | $3 | -3 
ae 2s = 2e 2s = 2e = ca 
gm | 30 | 39 | 5x 3° | 3° gm | 3° | 39 
A: Mode of response 1 | 1 | 1 | 29.22] 10.22 | 72.15 | 2.48 | 1.94 | 2.65 
B: Size of step 1 | 1 | 1 | 46.99 9.20 | 99.96 | 3.99% | 1.74 | 3.68 
C: Branching 1 ] | 1 | 1.27) 99 | 4.65) IL | 19) 17 
AB 1 1 | 1 | 11.05 | 15.66 | 54.73 | .94 | 2.97 | 2.01 
AC 1] 1 | 1 | 53.28] .82 | 65.84 | 4.52*| .16 | 2.42 
BC 1 |] 1 |] 1 [18.34] .08| 18.385/1.56 | .O1| .67 
ABC 1 | 1 | 1 | 2.23] 1.03] 5.57) .19 | .20) .2 
Within cells 71 | 71 | 71 | 11.93 | 5.28 anal 
Total 78 | 78 | 78 
* .05 level of significance. ae 
TABLE 3 TABLE 4 


ApsusTED MEAN Scores ON CONSTRUCTED 
REsPONSE CRITERION SUBTEST FOR 
DIFFERENT TRAINING CONDITIONS 




















Adjusted 
ean Scores! 
Training Mode eee AR 
sponse 
Subtest 

Constructed response | 40 | 14.14 | 3.35 
Multiple-choice 40 | 12.86 | 4.31 
Small step 40/| 14.31 .02 
Large step 40 | 12.69 | 3.72 
Branching 40 | 13.21 | 4.26 
No-branching 40 13.79 | 3.58 
Branching with con- | 20| 12.93 | 3.53 

structed response 
Branching with mul- | 20| 13.50 | 4.98 

tiple-choice 
No-branching with | 20| 15.35 | 3.36 

constructed re- 

sponse 
No-branching with | 20| 12.92 | 3.45 

multiple-choice 





tion took more time than the large step 
condition; and the nonbranching condition 
took more time than the branching condi- 
tion. Interactions were not significant. The 
mean scores and standard deviations on 
the time criterion are shown in Table 4 


Mean Time REQUIRED FOR TRAINING 
UnpberR DirreRENT MopEs 











a Training 

Training Mode Oda.) Ota) 
Constructed response 54.38 10.64 
Multiple-choice 44.38 10.92 
Small step 57.28 9.19 
Large step 41.48 7.60 
Branching 43.78 9.94 
No-branching 54.98 10.55 











Note.—WN for each training mode is 40. 


TABLE 5 


ANALYSIS OF VARIANCE OF TIME REQUIRED 
To CoMPLETE TRAINING 








j 





Source of variation ss df ms | F 
bad Scam 
A (Mode of response) | 2000.00| 1 2000.00] 11.68° 
B (Size of step) 4992.80 1 | 4992.80) 29.11" 
C (Branching) 2508.80 | 1 | 2508.80! 14.62° 
AB 42.05 | 1| 42.05) 25 
AC 48.05 | 1| 48.05) .28 
BC 162.45 | 1 162.45 5 
ABC 540.80 | 1| 540.80) 3.15 
Within cells 12347.80 | 72 171.60) 
Total 22642.75 | 79 | 














* .01 Level of significance. 
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Criterion 








and the analysis of variance is summarized 
in Table 5. 


Discussion 


Constructed Response Criterion Data 


Size of Step Variable. The finding that 
treatment groups given more items (small 
steps) learned more than groups receiving 
few items is in accord with B. F. Skinner’s 
(1958) emphasis on the importance of 
small steps in writing instructional item 
sets. The superiority of small steps in to- 
tal amount learned must be weighed, how- 
ever, against the significantly greater train- 
ing time required by small step trainees 
than by large step trainees. The best 
choice of item step size for any applied 
teaching situation will be determined in 
part by the extent to which required train- 
ing time is a critical consideration. 

Branching Variable. In the comparison 
of large item steps and small steps it was 
found that small step trainees, who were 
given more items, learned more than large 
step trainees. In view of this finding it is 
notable that branching, with its conse- 
quent reduction in total number of items 
presented, did not differ significantly from 
nonbranching in amount taught, though 
there was a suggestive difference in favor 
of nonbranching. Probably an important 
factor in this finding was the way in which 
items were eliminated in the branching 
procedure. An item was skipped only after 
the trainee had demonstrated some knowl- 
edge of the concept taught by that item, 
with the result that every major concept 
was covered by at least one item. At the 
same time, the branching procedure made 
possible a significant decrease in required 
training time, as compared with the non- 
branching procedure. 

When both the amount learned and the 
Tequired training time are considered, the 
branching procedure appears to offer an 
overall advantage over nonbranching. The 
present investigators feel, moreover, that 
branching, or other types of control flexi- 
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bility, offer even greater potential for fu- 
ture improvement in teaching machine ef- 
fectiveness. In the study reported here, 
limitations due to human operation of the 
teaching machine control mechanism 
placed heavy restrictions on the branching 
procedure. The greatly simplified skipping 
procedure resulted in a significant advan- 
tage in training time but not in amount 
learned. More complex forms of control 
flexibility, using some type of automatic 
equipment, may prove superior to pre- 
determined sequence control. in both 
amount learned and speed of learning. 

Response Mode Variable. Under the 
conditions of the experiment reported here, 
student response mode did not significantly 
affect the amount learned by the students. 
Since required training time was signifi- 
cantly less for multiple-choice trainees 
than for constructed response trainees, the 
overall advantage appears to be with the 
multiple-choice mode. It is possible, of 
course, that the results would be different 
under other experimental conditions, and 
further research in the area may prove 
valuable. 

Response Mode-Branching Interaction. 
The significant interaction between re- 
sponse mode and branching procedure can 
probably be attributed to the combined 
effects of two factors, neither of which is 
sufficiently powerful to have statistically 
significant effect when taken separately. 
The first factor contributing to the inter- 
action is the number of items presented to 
the Ss. By this hypothesis the constructed 
response trainees learned more without 
branching than with branching because the 
branching procedure caused a reduction in 
the total number of items. The second fac- 
tor contributing to the interaction effect 
was the relation between the response 
mode required in the training session, and 
that required in the criterion test. This 
might well be expected to give an advan- 
tage to the constructed response trainees 
due to generalization decrement on the 
part of the multiple-choice trainees. 
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Main effect differences in favor of the 
nonbranching and constructed response 
conditions, though nonsignificant, tend by 
their direction to support the “combined 
effects” interpretation presented here. 


Multiple-Choice Criterion Data 


Although the mean of the experimental 
Ss on the multiple-choice criterion signifi- 
cantly exceeded the control group mean, 
no significant differences were obtained 
among the different experimental groups. 
Differential effects of the various experi- 
mental treatment combinations on the 
multiple-choice criterion appear to have 
been masked by a larger effect common 
to all experimental groups. This masking 
error variance can probably be attributed 
to the nature of the multiple-choice cri- 
terion subtest. Of the 17 multiple-choice 
items, nine had four alternatives, two had 
three alternatives, and six had only two 
alternatives. The probability of obtaining 
correct answers on a chance basis could 
account on the average for approximately 
one-third of the items. The largest portion 
of the variance of the scores on the multi- 
ple-choice test may be due to this chance 
factor. Students of the sort used in the 
study are usually highly skilled in picking 
out subtle specific determiners by which 
they can eliminate implausible alternatives 
on grounds other than factual content. 
These skills, which together with day-to- 
day fluctuations in individual performance 
are treated as systematic variance in the 
Kuder-Richardson method of estimating 
reliability, contribute to over-estimation 
of the effective reliability of the test. In this 
case, the Kuder-Richardson reliability es- 
timate was .79. A more realistic estimate 
of reliability is given by the multiple- 
choice criterion test-retest correlation, 
which was only .58. 


Total Criterion Test Data 


The results of the analysis of the total 
criterion follow from the results obtained 
on the analyses of the two subtests. Error 
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variance in the multiple-choice portion of 
the total criterion appears to have masked 
intergroup differences so that these dif- 
ferences are not significant. It can be noted 
in Table 2 that the covariance F test ap- 
proaches but does not reach significance at 
the .05 level for both the size of step fac- 
tor and the mode of response-branching 
interaction. 


Time Criterion 


Differences in required training time 
are generally in the expected direction. 
Small step groups and nonbranching 
groups took longer than large step and 
branching groups because they were re- 
quired to answer more questions. The fact 
that constructed response groups took 
longer than multiple-choice groups indi- 
cates that the composition and writing of 
answers is more time consuming than the 
recognition of a “correct” solution among 
several alternatives, even though multi- 
ple-choice trainees were sometimes re- 
quired to make several selections before 
choosing the right answer. 


SUMMARY 


Eight groups of ten junior college stu- 


dents were given an experimental training | 
session with manually controlled teaching 


machines, each group being taught with a 
different mode of teaching machine opera- 
tion. The three independent variables 
were: (a) student response mode, (b) size 
of steps between successive items, and (c) 


sequencing (branching) procedure. A writ- | 
ten criterion test was given to all Ss im- 


mediately after the training session, and 


again three weeks later. The same criterion | 


test was given to a control group which 
came from the same school classes as the 
experimental Ss but which had no training 
on the concepts taught in the experimental 
sessions to the Ss. Dependent variables 
were the required teaching machine trail- 
ing time and scores on the criterion test. 
Scores on a pretest were used as a control 
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variable in a covariance design. Major re- 
sults were as follows: 

1. Use of the simulated teaching ma- 
chine led to significant learning by the Ss, 
as determined by comparison with the con- 


| trol group. 








2. The multiple-choice response mode 
took significantly less time than the con- 
structed response mode. No significant dif- 
ference was obtained between response 
modes on the criterion test. 

3. Small item steps required significantly 
more training time, but also yielded sig- 
nificantly higher test scores than large item 
steps on the constructed response criterion 
subtest. 

4. The branching conditions required 
less training than nonbranching, but were 
not significantly different on the criterion 
test. A significant interaction was obtained 
between the mode of response and branch- 
ing variables on the constructed response 
criterion. This interaction resulted from a 
high mean criterion score obtained by 
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the constructed response—nonbranching 
group. 

5. No significant differences were ob- 
tained among the experimental groups on 
the multiple-choice criterion subtest, or on 
the total (multiple-choice plus constructed 
response) criterion test. 
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OBJECTIVE EVALUATION OF A PROGRAM 
IN GENERAL EDUCATION 


GEORGE L. FAHEY anp JOE M. BALL’ 
University of Pittsburgh 


A program in General Education was 
initiated in The College of the University 
of Pittsburgh in 1955 after two years of 
intensive self-study and painstaking sylla- 
bus development. The faculty approved 
the program in principle but insisted that 
it be adopted experimentally and that 
learning outcomes be measured objectively. 
An extensive pretest battery was applied 
and posttesting was done after one, two, 
and four years. 

The four courses in General Education 
were: 

Core Curriculum 1-2, Written and 
Spoken English, 8 credit hours; 
Core Curriculum 3-4, Humanities, 6 
credit hours; 
Core Curriculum 5-6, Social Science, 
6 credit hours; 
Core Curriculum 7-8, Natural Science, 
6 credit hours. 
The research design for the present study 
was planned concurrently with the de- 
velopment of course outlines. 

Each of the courses was a two-semester 
offering for freshmen. Each was offered in 
two sections with enrollment limited to 25 
students per section. Teachers were volun- 
teers and were generally distinguished for 
their enthusiasm for General Education. 
All had been members of the University 
faculty for at least one previous year. 
Students entering these courses did so by 
invitation only and were given a remission 
of half tuition for the core courses in which 
they registered. 


*At the time this study was conducted, 
G. L. Fahey was Director of the University 
Testing Service and J. M. Ball was Chair- 
man of the Curriculum Committee of The 
College, University of Pittsburgh. 
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HyYPoTHESES 


This study compared students enrolled 
in core curriculum courses with those en- 
rolled in conventional courses. Groups 
were equated in advance and several 
measures of educational growth were ap- 
plied at different levels. It was hypothe- 
sized that: 

1. Students enrolled in core curriculum 
courses do as well as students enrolled in 
conventional courses on knowledge-type 
criteria of achievement at the termination 
of core curriculum courses and on later 
measures. 

2. Students enrolled in core curriculum 
courses do significantly better than stu- 
dents enrolled in conventional courses on 
“critical-thinking”—type measures of edu- 
cational growth. This relative excellence 
is positively related to the number of core 
courses taken and is most evident on di- 
rectly related measures; e.g., students who 
have completed the core curriculum course 
in natural science have an advantage on 
the measure of reasoning in the natural 
sciences. 

3. Students enrolled in core curriculum 
courses show less tendency to withdraw 
from college than do those in conventional 
courses. 

Psychometric techniques are not avail- 
able to measure all changes which might 
occur in relation to such differential en- 
rollment and, even if they were, there are 
practical limits of time and student pa 
tience. The differences investigated were 
those related to achievement as measured 
by grades earned and achievement tests. 

Teaching effectiveness as such was not 
investigated. Only a few teachers taught 
core curriculum courses, but it would have 
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been virtually impossible to contrast them 
with all the other instructors who reached 
students in conventional courses. 


METHODS OF MBASUREMENT 


As part of the procedure for admission 
to The College each student took the 
American Council on Education Psycho- 
logical Examination (1954), and the Co- 
operative Testing Service achievement 
tests in Social Science, Natural Science, 
Mathematics, Mechanics of Expression, 
Effectiveness of Expression, and Reading 
Comprehension. Scores for this group of 


| tests were combined to produce a Com- 


posite Predictor Stanine Score which was 
used to test comparability of sampling 
and also as a pretest score in some compari- 
sons. This composite score was not avail- 
able in time for selection. 

During freshman week, the battery of 
the Cooperative Study of Evaluation in 
General Education was administered. This 
included the tests of General Critical 
Thinking, Critical Thinking in the Social 
Sciences, Natural Science Reasoning, and 
Inventory of Beliefs. 

At the end of the freshman year, Me- 
canics of Expression and Effectiveness 
of Expression were readministered in 
freshman English classes during final ex- 
amination week. The Cooperative Study of 
Evaluation in General Education battery 
was readministered three weeks before the 
end of the year. 

The Cooperative General Culture Test 
was administered as a posttest only. It was 
used since it appears to measure conven- 


tional curricular objectives and could not 





be viewed as biased in favor of a General 
Education approach. Some modification 
was necessary. The Mathematics subtest 
was omitted since the experimental pro- 
gram did not include direct instruction in 
mathematics. Its scores would likely have 
served only to identify those who had 
elected a mathematics course during the 
year. The Current Social Problems subtest 
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was revised locally to make items more 
current. Changes were limited to changes 
of names and places. Revising was done 
with the permission of the publishers and 
a trial run with University sophomores 
showed no change in reliability, mean, or 
variance. 

Check-type tests may not be as effective 
as actual writing for measuring changes in 
the use of written English. Accordingly, 
English themes were collected from both 
experimental sections and from two con- 
trol sections selected at random. Each stu- 
dent wrote early in the fall on one of two 
topics: Laughter or Idleness. Each wrote 
again in the spring on the other topic. 
Themes were written in class in examina- 
tion booklets and according to identical 
instructions. Students did not know that 
the themes were related to the experiment. 
Themes from the fall were stored until 
those from the spring were available. Then 
pre- and postpairings were made, code 
numbers assigned to each booklet, and all 
identification data removed. All booklets 
were drawn from stock in the fall to pre- 
vent size or color cues. Themes were 
graded on the A through F scale with plus 
and minus shadings. These scores were 
converted to stanines for computations. 
Each theme was read by two independent 
judges. When judges disagreed, an aver- 
age of their judgments was used. Judges 
were all members of the English faculty 
and had several years of local experience. 
None were teachers of the sections in- 
volved and likelihood of recognition by 
handwriting or style was very remote. 
Judges could not know whether a particu- 
lar booklet was for an experimental or con- 
trol S or whether it was written in the 
fall or spring. Interjudge reliability was 
found to be 0.59. 

Freshman quality-point averages were 
computed with the A = 4, F = 0 formula 
for all who completed at least one semester. 
Required physical education and courses 
in military science were not included. In 
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addition to an overall quality-point av- 
erage for each student, a quality point av- 
erage for noncore curriculum courses was 
computed for each experimental student. 

At the end of the sophomore and senior 
years, the Area Tests of the Graduate 
Record Examinations were administered 
to all students who would take them. All 
were notified of the tests and strongly 
urged to take them but they were not 
forced. This permissive attitude was 
adopted after considering the effect on 
morale of a student regimented into an 
examination he does not consider essential. 

All scores were converted to stanines on 
local norms. 


PoPpuULATION STUDIED 


The total study group included 666 
students. These were all those admitted 
for the semester beginning in September, 
1955, to The College (recently renamed 
School of the Liberal Arts) but excluding 
evening and part-time students, students 
transferring more than six credit hours 
from other collegiate institutions, and a few 
special students carried in The College for 
administrative convenience. 

The population was divided into two 
major groups: (a) those enrolled in core 
curriculum courses (experimentals), and 
(6) those not enrolled in core curriculum 
courses (controls). The core curriculum 
students were further divided into sub- 
groups according to the specific core 
courses taken and according to the num- 
ber of core courses taken. Four core 
courses were available and a student might 
elect 1, 2, or 3, but not 4 of them. His 
remaining hours were elected in conven- 
tional courses. 

Enrollment in core curriculum courses 
was voluntary but controlled by quota. 
The research design specified that those 
enrolled must be representative of The 
College population. On the basis of records 
for the freshman class of 1954, a quota 
system based on sex, high school fifth, and 
scores on the admission tests was de- 
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veloped to guide enrollment of core cur- 


riculum students. When advised of admis- , 


sion, students were told about the core 
program and invited to participate in it. 
Those who volunteered were registered for 
the course or courses of their choice up to 
the quota limits. The experimental popu- 
lation thus selected was representative of 
The College freshman class on ratios of 
male to female, high school rank, and low 
to high scholastic aptitude. Each core 
course had two sections and the quota sys- 
tem applied to each section. 

Attrition was inevitable. Some students 
withdrew during the year and some missed 
one or more tests. More complete test data 
were obtained for experimental Ss since en- 
forcement of testing was not as easy with 
the control population which, fortunately, 
was large enough to allow for more attri- 
tion without distortion. 

All analyses were based on the largest 
number of cases for which appropriate 
scores were available. Analyses at the end 
of the freshman year involved 72 or more 
experimental and 454 or more control Ss 
except on the English theme measure where 
data were obtained from a sample of 47 
experimental and 67 control Ss. Table | 
reports numbers of Ss and also tests of rep- 
resentativeness of samples as appropriate. 

Sampling tests, using the composite pre- 
dictor stanine means, indicated that con- 
trol and experimental Ss did not differ 
significantly on selection variables initially 
or at the sophomore or senior levels. In 
both groups, those who survived in college 
did differ significantly from those who 
withdrew. 


RESULTS AND DISCUSSION 


Results are reported in three sections: 
(a) end of the experimental year, (b) end 
of the sophomore year, (c) and end of the 
senior year. 


End of Freshman Year 


Comparisons of the amount of change 
from pre- to posttesting for 14 variables 
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EVALUATION OF A PROGRAM IN GENERAL EDUCATION 


TABLE 1 


CaTEGORIES OF SuBJECTS AND TEsTs oF SAMPLING 
Composite PrepicTtorR STANINE MEASURE 





























Experimental Control 
- — CR 
N | Mean SD N | Mean | SD 
— 

Total subjects 87 | 5.19 1.93 | 579 4.92 2.61 | 1.20 
Freshman year withdrawals 4 | 51 
End of freshman year Ss 83 | 528 
Sophomore year withdrawals 14 98 
End of sophomore year Ss 69 430 

Avoided Area Tests | (23) | (260) 

Took Area Tests (46) | 5.89 1.96 | (170) | 5.60 1.91 | 0.89 
Junior-Senior withdrawals 9 | 97 
Graduated from The College 44 | 5.95 1.81 127 5.88 1.84 | 0.22 

Avoided Area Tests (3) | (13) 

Took Area Tests (41) | 5.90 1.92 (114) 5.81 1.91 0.26 
Graduated from professional 4 | 88 

schools | 
Not graduated but still en- | 12 118 
rolled | 

Total not graduated 39 | 4.64 1.48 364 4.59 2.01 | 0.19 
Experimental graduated- | 5.95 

experimental not graduated | 4.64 3.64** 
Control graduated- 5.88 

control not graduated 4.59 | 6.67** 











** = .01 confidence level. 


on which experimental and control Ss 
were contrasted are shown in Table 2. 
Each student’s prescore on each measure 
was subtracted from his postscore. The 
differences shown in Table 2 are those be- 
tween the means of differences for each 
group. The Cooperative General Culture 
Test was administered only in the spring. 
Quality-point averages could be computed 
only at the end of term. For these non- 
repeated measures, the composite predictor 
stanine score was used as if it were a pre- 
test on the assumption that amount of 
change is related to variance expressed by 
such a score. 

Only one obtained difference was signifi- 
cant. This was on the Fine Arts subtest of 
the Cooperative General Culture Test. 
Interpretation of this finding should be 
related to the fact that the Humanities 
core course included instruction in fine 
arts, whereas such instruction was rarely 


elected by control Ss. Ten other differences 
showed greater change among the experi- 
mental students but not at levels of sta- 
tistical confidence. Three observed differ- 
ences favored the controls but not at levels 
of statistical significance. The one involv- 
ing the Natural Science subtest of the 
Cooperative General Culture Test was 
studied in more detail. When the mean of 
the experimental Ss was compared with 
the mean of a control population from 
which preprofessional students (mostly, 
premedical and predental) were excluded, 
the obtained difference was reversed so 
that it was in favor of the experimental Ss 
but it was still within the limits of rea- 
sonable chance. 

The only safe inference from these find- 
ings seems to be that the students enrolled 
in core curriculum courses did as well or 
a little better on most measured variables 
than did their peers in conventional 
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TABLE 2 
SIGNIFICANCES OF DIFFERENCES OF CHANGES ON OBJECTIVE TEST MEASURES 


(EnpD or FR 


ESHMAN YEAR) 








N Mean of Experimentals 
Minus Mean of Controls 





Objective Test Measure 











Experimentals Controls Difference CR 
English themes 47 67 (—)0.04 0.11 
Mechanics of Expression 72 508 (—)0.17 1.04 
Effectiveness of Expression 72 508 0.13 0.65 
Cooperative General Culture (a) 
History and Social Studies 74 471 0.42 1.91 
Literature 74 471 0.41 1.85 
Natural Sciences 74 472 (—)0.20 0.88 
Fine Arts 74 472 0.66 2.89** 
Current Social Problems 72 470 0.26 1.23 
General Critical Thinking 74 456 0.13 0.77 
Critical Thinking in the Social Sci- 73 454 0.10 0.50 
ences 
Natural Science Reasoning 74 456 0.30 66 
Inventory of Beliefs 73 454 0.27 1.49 
Quality-Point Average-All (a) 81 569 0.07 0.35 
Quality-Point Average (excluding 81 569 0.05 0.22 
core courses) 
* Composite Predictor Stanine score used as pretest score. 


** = .01 confidence level. 


courses. The first hypothesis of this study, 
that they do as well on knowledge-type 
criteria was supported. The second hy- 
pothesis, that they do better on “critical 
thinking” measures, was not supported. 
Comparisons were made on each of the 
measures between subdivisions of the ex- 
perimental group and the controls by the 
mean difference method. Subdivisions in- 
cluded those who took each of the sepa- 
rate core curriculum courses and those who 
took 1, 2, or 3 core courses. All these com- 
parisons were subject to interactions which 
could not be controlled within the size of 
the available subgroups. For example, 
those in 2 or 3 core courses might have had 
any available combination from the core 
courses. Also, those in any “one core 
course” might have been in different core 
courses. A breakdown of experimental Ss 
into each possible combination of specific 
courses and number of courses produced 
subgroups too small for statistical tests. 
Table 3 reports those differences which 
reached the 5 or 1% confidence levels. 


Those taking three core courses ap- 
peared in the more favorable position in 
four comparisons. Those taking one core 
course appeared in the more favorable po- 
sition in four comparisons but in the less 
favorable position twice. Those enrolled 
in the Written and Spoken English course 
(Core 1-2) appeared in more favorable 
positions than did any others but did 
not do so on the English measures. The 
Natural Science core course group (Core 
7-8) appeared favorably in several com- 
parisons but did not do so on the s¢i- 
ence measures. The Social Science core 
group (Core 5-6) appeared in only one 
positive comparison but that one was on 
the measure of Critical Thinking in the 
Social Sciences. 

There was not more than tentative sup- 
port for the hypothesis that excellence in 
achievement increases with the number of 
core courses taken. The hypothesis that 
the effects of a specific core course will be 
evident on its relevant measure was sup- 
ported only for the Social Science course 
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TABLE 3 


SIGNIFICANT DIFFERENCES BETWEEN MEANS OF EXPERIMENTAL SUBDIVISIONS AND MEANS 
or ConTROLS (UNCORRECTED FOR COVARIANCE): CONFIDENCE LEVELS AND DIRECTION 





Measures 
1-2 
Ns on English theme measure 42 
Range of experimental WNs, 42-47 
on other measures 
English themes | Oo 
Mechanics of Expression | o 
Effectiveness of Expression | oO 
History and Social Studies .05 
Literature .05 
Science | o 
Fine Arts ~. 
Current Social Problems .05 
General Critical Thinking 0 
Critical Thinking in the Social o 
Sciences 
Natural Science Reasoning 05 
Inventory of Beliefs o 
Quality-Point Average (all) o 
Quality-Point Average (excluding | .05 


core courses) 














Note.—Changes greater among control Ss indicated by (—). 


and only on one of the social science meas- 
ures. 


End of Sophomore Year 


During the sophomore year students 
were free to elect courses as did other stu- 
dents in The College and to volunteer for 
the Area Tests of the Graduate Record 
Examinations. Out of the original groups, 
53% of the experimental and 29% of the 
control Ss took these tests. In spite of this 
reduction in Ss, data in Table 1 indicate 
that the experimental and control samples 
were still comparable with respect to the 
composite predictor. The differential mo- 
tivation for taking the Area Tests is not 
readily explained. One might infer that 
students who take core curriculum courses 
show a greater tendency to report for ex- 
aminations! Ss by this time were “part of 
an experiment,” for the researchers and in 
their own minds. No administrative identi- 








DiFrFERENCE (END oF FRESHMAN YEAR) 





Core Course Number of Core Courses 











34 5 7-8 1 2 3 
19 20 | 2 ; 4 | 2 20 
33-41 | 34-41 | 40-47 | 14-18 | 34-38 | 22-25 
eo | @t «s o o 0 
o |(—).05) o |(—).05) o ri) 
oO Oo 8) oO °o o 
.05 o 05 | o | oOo .05 
o | o .05 05 Oo Ol 
oO o o (—).05 oO rs) 
.01 o 01 o | .Ol | .Ol 
.05 | o 05 Oo o | .0Ol 
eo + @ ts 05 | o | Oo 
oO .05 | o 06 | o | o 

} | 
o oO oO .05 oO oO 
0 o o 0 o | o 
0 o o 0 o | o 
o o oO 0 o | o 


fication of the kind had been made for a 
year. 

Comparisons between means of experi- 
mentals and controls are shown in Table 
4. Students irom the core curriculum 
courses were found to have significantly 
higher means on the Humanities and So- 
cial Science tests. Natural Science test 
means were virtually identical. 


End of the Senior Year 


The Area Tests were administered to 
seniors graduating from The College in 
May, 1959. As shown in Table 1, the 
coverage of the graduating class was quite 
complete. Experimental and control Ss 
were still comparable with respect to the 
composite predictor score from the first 


year. 

Area Tests results for the seniors are re- 
ported in Table 5. The experimental group 
mean on Social Science continued to be 
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TABLE 4 

AreEA Tests DirFERENCES (END 
oF SOPHOMORE YEAR) 
pee Control 
Area Tests (N = 46) (N = 170) oR 

Mean SD Mean SD 
Social Science 5.67 1.84 4.91 1.79 2.53° 
Humanities 5.85 1.56 4.86 1.95 3.62°° 
Natural Science 5.11 1.98 5.14 2.01 0.09 





= .05 confidence level. 
** = .01 confidence level. 


significantly above the mean of the control 
group. The relative excellence of the ex- 
perimental Ss as sophomores on the Hu- 
manities was no longer evidenced at the 
.05 confidence level. The Natural Science 
means remained undifferentiated. 

Experimental Ss who had taken the 
Humanities core course reached a signifi- 
cantly higher mean than did the control Ss 
on the Humanities Area Test. Those who 
had taken the Natural Science core course 
fell significantly below the mean of the 
controls on the Natural Science Area Test. 
In 9 of the 10 subgroup comparisons lack- 
ing significance the experimental subjects 
had the higher means. 

On those educational accomplishments 


GEORGE L. FAHEY AND JOE M. BALL 


measured by the Area Tests the experi- 
mental Ss seemed to have accumulated no 
educational deficit from their experience 
with the core curriculum except that those 
who had taken the Natural Science core 
course were at a disadvantage on the Natu- 
ral Science Area Test. 

An uncontrolled variable in these Area 
Tests comparisons may be the election by 
students of major fields of study. There was 
such selection in the groups studied. 
Thirty-six per cent of experimental and 
48% of control Ss majored in the natural 
sciences. Thirty-one per cent of experi- 
mentals and 17% of controls majored in 
the social sciences. Thirty-one per cent of 
experimentals and 20% of controls ma- 
jored in the humanities. Two per cent of 
experimentals and 15% of controls majored 
in interdisciplinary combinations. Data to 
explain these differential proportions are 
not available although they undoubtedly 
had an influence on the Area Tests means. 
This fact, however, does not detract from 
the test of the as-well-as hypothesis. 

A four-year cumulative quality-point 
average was computed for each graduate. 
These were converted to stanines. The 
mean of this measure for the experimental 
Ss was 5.20 with a standard deviation of 


TABLE 5 


SIGNIFICANCE OF DIFFERENCES BETWEEN MEANS OF EXPERIMENTAL GROUP AND SUBGROUPS 
AND THE MEAN OF THE CONTROLS ON THE AREA Tests (END or SENIOR YEAR) 





















































Area Tests 
Groups N Social Science Humanities Natural Science 
Mean; SD | CR* | Mean; SD CR* Mean| SD CR* 

Control 114) 4.78) 1.99 4.86)1.99 5.04) 2.01 
All experimental 41 5.61| 1.75|2.50*| 5.49/1.86 1.82 4.93} 2.02) (—)0.30 
Experimental Subgroups 

CC 1-2 25) 5.36| 1.98)/1.30 | 4.68)1.71)(— )0.45 5. 12) 2.03 0.18 

CC 34 18} 5.72) 1.88)1.97 | 6.17/1. ~ 3.17**| 4.61) 2.08 0.80 

CC 5-6 22| 5.45) 1. 43/1.84 | 5. 45/1.5) 1.50 5.41) 1.95 0.80 

CC 7-8 26) 5.50) 1. 65/1. 92 | 5. 27/1.9 0.94 4. 15) 1.58) (— )2.45° 





E 


* Each Critical Ratio is based on an experimen 
are indicated by (—). 

* Significant at the .05 level. 

** Significant at the .01 level. 


mean minus the control mean. Differences favoring the controls 
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1.60. The mean for the controls was 4.94 
with a standard deviation of 2.08. The 
difference between the means was not 
more than chance, but the difference be- 
tween the standard deviations exceeded 
the .05 confidence level. The core curric- 
ulum students again did as well or better 
than their controls. Why they showed less 
variability is not known. 

The hypothesis that students in the 
core curriculum program would show less 
tendency than students in conventional 
courses to withdraw from college was sup- 
ported. Fifty-five per cent of the experi- 
mentals and 37% of the controls graduated 
from The College or one of the University 
professional schools. This difference in per- 
centage is significant at the .01 level of 
confidence. 


SuMMARY 


This study dealt with the effects of a 
General Education program conducted at 
the college freshman level on achievement 
during the first year and at later levels. 
Selected freshmen enrolled in core cur- 
riculum courses in the fall of 1955. All 
other freshmen admitted to the liberal 
arts program were used as controls. Ex- 
perimental Ss were a representative sam- 
ple on age, sex, high school fifth, and 
appropriate aptitude and achievement 
measures. Four experimental courses were 
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offered and a student might enroll in 1, 
2, or 3 but not 4 of them. 

Measures of achievement were admin- 
istered as pre- and posttests in the fresh- 
man year. At the end of the sophomore 
and senior years the Area Tests of the 
Graduate Record Examinations were ad- 
ministered. Quality-point averages were 
computed at the end of the freshman year 
and at graduation. 

Comparisons were made between experi- 
mental and control Ss on 22 measures. 
On 17 of these the differences favored the 
experimentals and 5 of these 17 differences 
reached or exceeded the .05 confidence 
level. Five of the 22 comparisons favored 
the control Ss but none of these reached 
a level of statistical confidence. 

The hypothesis that the experimental 
Ss would do as well as the controls was 
suported, as was the hypothesis that the 
experimentals would show more tendency 
to graduate. The hypothesis that core 
curriculum courses would favorably in- 
fluence performance on similarly named 
Area Tests and on “critical-thinking” 
types of tests was not convincingly sup- 
ported. Experimental Ss generally per- 
formed better on the subtests in social 
science and humanities. They generally did 
not do so well in the natural science area. 


(Received November 30, 1959) 
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The most common method used to 
evaluate the course work of undergradu- 
ate students of psychology usually con- 
sists of some kind of objective examination 
over the facts and principles covered in 
the course. In courses where the desired 
goal is knowledge of content, this type of 
examination seems adequate. However, in 
courses, such as elementary educational 
psychology, where one of the goals is the 
learning of material for the purpose of 
future application, the investigators raise 
the question, “Is the knowledge of course 
material enough?” Horrocks (1946) has 
offered some evidence indicating that 
knowledge of material does not necessarily 
indicate the ability to use the material 
meaningfully. Tyler (1933) and Wert 
(1937) present evidence that some out- 
comes such as inference and ability to ap- 
ply principles are retained at maximum or 
greater strength while information material 
is almost completely lost over a period of 
a few years in terms of test performance. 
These studies seem to indicate that neglect- 
ing applicational ability may be a serious 
problem in classroom evaluation. 

In the elementary course in educational 
psychology at The Ohio State University, 
the concern for the ability of the students 
to apply knowledge has resulted in the use 
of group discussion as the primary teach- 
ing method in the course. Although this 
teaching method has been in effect for 
many years, the primary method of evalu- 

1 Grateful acknowledgement is made to 


John E. Horrocks and Robert J. Wherry of 
The Ohio State University. 


ation of the students was the objective 
multiple-choice examination. Increasing 
dissatisfaction among the instructors of the 
course (psychology graduate students, in- 
cluding the investigators) with the evalua- 
tion technique led to the development of 
two tests designed to measure the abil- 
ity to apply knowledge. One of the tests, 
Murray Mursell, was similar to the case 
studies of Horrocks and Troyer (1946). 
The Horrocks-Troyer case study tests as 
well as the Murray Mursell test involved 
individual adolescents and for this reason 
could not be expected to yield a measure of 
applicational ability representative of all 
educational psychology. For this reason, 
the investigators in the present study con- 
structed a case study test, The Redwood 
School*, which was designed to cover the 
total area of educational psychology as 
defined by the textbook (Pressey & Rob- 
inson, 1944). The Redwood School test 
was designed to include the total school 
group as a unit. Information is given about 
an eight-grade, one-room school and the 
surrounding community. Detailed informa- 
tion is included about five students of vary- 
ing age, grade, sex, ability, and problems. 
In addition to this information a series of 
coordinated behavioral incidents is in- 
cluded involving the teacher and pupils 
interacting in a classroom situation. Ap 
plicational ability implies a skill on the 


2 The Redwood School test has been de- 
posited with the American Documentation 
Institute. Order Document No. 6251, remit- 
ting $1.25 for microfilm, or $1.25 for photo 
copies. 
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TESTS OF ABILITY TO APPLY KNOWLEDGE 


part of the teacher to use facts and prin- 
ciples to form hypotheses (diagnoses) and 
take action (remediation) on the basis of 
these hypotheses. Thus, following the case 
study information, 117 diagnostic and re- 
medial questions are included in The Red- 
wood School test, which can be answered 
by either agreeing or disagreeing with the 
statement as to whether it indicates the 
proper diagnosis or remedial action. 

In addition to these case study tests, 
various rating scales have been developed 
(Bowlus, 1955, Bartlett, 1958) which have 
been used to evaluate the students’ per- 
formance in the group discussions. Eval- 
uation of students in the educational 
psychology course then consisted of com- 
bining the scores on the multiple-choice 
tests, the case study tests, and the ratings. 


THE PROBLEM 


Although the case study tests were con- 
structed to measure the ability to apply 
knowledge, the only evidence that they 
measured something other than knowledge 
of course material was their own face 
validity. Also, in view of the number of 
evaluation techniques used in the course, 
the authors felt that an analysis of all of 
the techniques used for evaluation might 
be useful. Thus, this study was done for 
the purpose of examining all of the evalua- 
tion techniques of the course as they were 
related to each other. Factor analysis was 
selected as a logical method for examining 
the relationships between these various 
evaluation techniques. 


The Variables 


The following variables were included in 
the analysis: 

1. The Minnesota Teacher Attitude In- 
ventory (Cook, Leads, & Callis, 1951)— 
taken by all the students at the beginning 
of the course. 

2. The Minnesota Teacher Attitude In- 
ventory—taken by all the students at the 
end of the course. 

3. The Ohio State Psychological Ex- 
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amination (Toops)*—taken by all students 
at time of entrance to the university. 

4. Point hour ratio—the ratio of honor 
points to the number of course hours com- 
pleted. 

5. Overall Leadership—as measured by 
the Bartlett Leadership Behavior Scale 
(BLBS [Bartlett, 1958])—peer rating. 

6. Contribution of Ideas and Informa- 
tion (BLBS)—peer rating. 

7. Contribution of Friendly Atmosphere 

(BLBS)—peer rating. 

8. Contribution of Labor and Effort 
(BLBS)—peer rating. 

9. Contribution of Policy and Decisions 
(BLBS)—peer rating. 

10. Overall Leadership (BLBS)—self- 
rating. 

11. Contribution of Ideas and Informa- 
tion (BLBS)—-self-rating. 

12. Contribution of Friendly Atmos- 
phere (BLBS)—self-rating. 

13. Contribution of Labor and Effort 
(BLBS)—self-rating. 

14. Contribution of Policy and Decisions 
(BLBS)—self-rating. 

15. Overall Leadership—peer nomina- 
tion. 

16. Contribution of Ideas and Informa- 
tion—peer nomination. 

17. Contribution of Friendly Atmos- 
phere—peer nomination. 

18. Contribution of Policy and Decisions 
—peer nomination. 

19. Contribution of Labor and Effort— 
peer nomination. 

20. The Bowlus Group Effectiveness 
Scale (Bowlus, 1955)—a forced-choice rat- 
ing scale designed to measure the effective- 
ness in group discussion. 

21. A multiple-choice examination over 
textbook. 

22. A multiple-choice examination over 
textbook. 

23. Age—although not used for evalua- 
tion, it was felt that correlation with age 





*H. A. Toops, The Ohio State University 
Psychological Test, Columbus, Ohio College 
Association. 
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might be helpful in evaluating the tech- 
niques. 

24. Murray Mursell (diagnostic) —the 
portion of a case study examination used 
to measure the ability of a person to diag- 
nose behavior. 

25. Murray Mursell (remedial)—the 
portion of a case study examination used 
to measure the ability to recommend reme- 
dial action. 

26. The Redwood School (diagnostic)— 
the diagnostic portion of a case study ex- 
amination, written by the authors. 

27. The Redwood School (remedial)— 
the remedial portion of a case study ex- 
amination, written by the authors. 

28. Sex—male was scored as high on the 
dichotomy. 

29. Year in college—how many years of 
college the student has had. 

30. Grade in general psychology course 
—a course grade in a psychology course, 
based on multiple-choice tests, which was 
& prerequisite to education psychology. 

Although all variables were not used as 
evaluation techniques, it was considered 
that all variables might be useful for the 
purpose of analyzing the evaluation tech- 
niques. Of the ten variables obtained from 
the Bartlett Leadership Behavior Scale, 
the overall leadership measures (peer and 
self-rating) were based on forced-choice 
ratings. The measures of contributions to 
the group from the BLBS were ipsative 
measures from a descriptive ranking type 
scale. That is, they measured contributions 
in terms of relative strength to each other 
rather than being a measure of absolute 
contribution in each area. 

Data were collected from 100 students in 
the educational psychology course, and the 
30 variables were intercorrelated and fac- 
tor analyzed* using a modification of the 


‘The entire orthogonal factor matrix 
and a table of intercorrelations and resid- 
uals has been deposited with the American 
Documentation Institute. Order Document 
No. 6251, remitting $1.25 for microfilm, or 
$1.25 for photocopies. 


C. J. BARTLETT, R. R. RONNING, AND J. G. HURST 


multiple-group centroid method. All fac- | 


tors were rotated to maximize psychologi- 
cal meaningfulness. Orthogonality was pre- 
served for all factors. All residuals were .15 
or below with the following exceptions. A 
residual correlation of .34 was left between 
the pretest and posttest on the Minnesota 
Teacher Attitude Inventory which repre- 
sented the reliability of the scale. The 
intercorrelations between the  ipsative 
measures on the BLBS were all negative 
as a result of their ipsative nature. It did 
not seem to be meaningful to extract fac- 


tors which were only a function of the | 


ranking method used to measure the vari- 
able. 


RESULTS 


Seven factors were extracted. In present- 
ing each factor, variables having loadings 
of .20 or higher will be included with the 
facor loading shown following the variables. 

Factor I. General Achievement Ability. 
The highest loading on point hour ratio 
and the high loadings on other achieve- 
ment related variables seem to make the 
identification fairly clear. The variables 
loading on the factor are as follows: 


( 4) Point hour ratio (.70) 

( 3) Ohio State Psychological Exam (62) 

(21) Multiple-choice test (.55) 

(16) Ideas and Information (nomination) 
(.49) 

MTAI at end of quarter (.47) 

Multiple-choice test (.44) 

Redwood School test (remedial) (.44) 

Labor and Effort (nomination) (.41) 

Overall Leadership (nomination) 
(.39) 

Redwood School test (diagnostic) 
(38) 

Policy and Decisions (nomination) 
(36) 

Bowlus Scale (.36) 

MTAI at beginning of quarter (35) 

Friendly Atmosphere (self-rating) 
(—34) 

Murray 
(.33) 

Friendly Atmosphere (peer rating) 
(—30) 

Ideas and Information (peer rating) 


(29) 


( 2) 
(22) 
(27) 
(18) 
(15) 


(26) 
(19) 
(20) 
( 1) 
(12) 
(24) Mursell test (diagnostic) 
( 7) 


( 6) 
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(30) Grade in general psychology (.27) 

( 5) Overall Leadership (peer rating) 
(26) 

(25) Murray Mursell test (remedial) (25) 

(29) Year in eollege (22) 


Factor II. Knowledge of Facts and Prin- 
ciples. The loadings on the multiple-choice 
examinations and on general psychology 
grade (which is also based on multiple- 
choice examinations) indicate that this 
factor represents an orientation toward 
learning factual material. Another pos- 
sible identification for this factor might be 
a special ability in multiple-choice type 
examinations. It is interesting to note the 
negative loading on the MTAI, at the be- 
ginning of the course, indicating that these 
persons who have a factual orientation are 
likely to enter the course with attitudes 
(as measured by the MTAI) considered by 
some to be undesirable for persons in the 
teaching profession. The variables repre- 
senting this factor are included below: 


(21) Multiple-choice test (.45) 

(22) Multiple-choice test (.45) 

(30) Grade in general psychology (.41) 

( 1) MTAT at beginning of quarter (— 39) 

(28) Sex (male higher) (.26) 

(23) Age (21) 

(12) Friendly Atmosphere 
(.21) 


Factor III. Applicational Ability. This 
factor seems to be dominated by the case 
study tests. Since other measures of knowl- 
edge of material show negligible loadings, 
this factor seems to be something other 


(self-rating) 


| than knowledge of course material. This 





factor also shows loadings on leadership 
and favorable attitude, suggesting a set for 
putting knowledge into positive action. 
Since the case study tests were written for 
the purpose of measuring the ability to 
apply knowledge, this factor tends to sup- 
port the original hypothesis that the case 
study tests do measure the ability to ap- 
ply knowledge of educational psychology. 
Variables having loadings of .20 or greater 
are: 


(27) Redwood School test (remedial) (.68) 
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(26) Redwood School test (diagnostic) 
(.56) 

Murray Mursell test (remedial) (.42) 

Overall Leadership (self-rating) (.26) 

Labor and Effort (self-rating) (.22) 

Overall Leadership (peer rating) 
(21) 

MTAI at beginning of quarter (.20) 

MTAI at end of quarter (.20) 


(25) 
(10) 
(13) 
( 5) 


(1 
(2 


— 


Factor IV. Leadership. The identification 
of this factor seems to be fairly clear since 
all of the high loadings, including those for 
the forced-choice scales, are leadership 
measures. It is interesting to note that 
friendly atmosphere as an ipsative measure 
loads negatively, but when measured from 
nominations it loads positively. This sug- 
gests that it is desirable to contribute 
friendly atmosphere to a group to become 
a leader, but if friendly atmosphere is the 
strongest contribution (relative to other 
contributions) it is negatively related to 
leadership. It is also interesting to note that 
self-ratings of overall leadership on the 
BLBS show a substantial loading. This is 
in conflict with the findings of a previous 
study which found self-ratings on over- 
all leadership using the BLBS, were not 
related to general leadership as measured 
by peer ratings and nominations. (Bartlett, 
1959.) It seems that further research is 
indicated before any conclusions can be 
made concerning the use of self-ratings as 
a leadership measure. The variables which 
identify this factor include: 


(15) Overall Leadership (nomination) 
(.79) 

(16) Ideas and Information (nomination) 
(.77) 

(19) Policy and Decisions (nomination) 
(61) 

( 5) Overall Leadership (peer rating) 
(58) 

(18) Labor and Effort (nomination) (53) 

(10) Overall Leadership (self-rating) (.42) 

(17) Friendly Atmosphere (peer rating) 
(29) 

(12) Friendly Atmosphere (self-rating) 
(—.24) 

( 7) Friendly Atmosphere (peer rating) 


(—22) 
Sex (male higher) (.20) 
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Factor V. Age—-Sezx. The older males seem 
to be strongest in the contribution of ideas 
and information to the group, and weakest 
in contribution of labor and effort. Note 
that although nominations on leadership 
tend to go toward the older males, ratings 
when bias is controlled show no age-sex 
discrimination. This factor includes the fol- 
lowing variables: 
(29) Year in college (.80) 
(23) Age (.75) 
(28) Sex (male higher) (59) 
(11) Ideas and Information (self-rating) 
(.40) 

(19) Policy and Decisions (nomination) 
(35) 

( 6) Ideas and Information (peer rating) 
(29) 

( 8) Labor and Effort (peer rating) (—.28) 

(16) Ideas and Information (nomination) 
(.25) 

(15) Overall 
(.24) 

Factor VI. Decision Making vs. Labor 
and Effort. The high loadings on these two 
categories of self-ratings on BLBS seem to 
identify this factor. Peer ratings on these 
categories seem to verify the identification. 
The difficult interpretation involves the 
smaller loadings on the Murray Mursell 
case study test. This test was used as a 
mid-quarter examination, and Redwood 
School was used as a final examination. 
With the students’ earlier experience with 
the case study test, the ability to make 
decisions appears to be helpful to a slight 
degree in successful performance on the 
case study test. This advantage to the de- 
cision maker does not seem to be present 
after having more experience in taking case 
study tests, since the Redwood School test 
has negligible loadings on this factor. These 
variables identfy this factor: 


(nomination) 


Leadership 


(13) Labor and Effort (self-rating) (—.89) 

(14) Policy and Decisions (self-rating) 
(.60) 

( 9) Policy and Decisions (peer rating) 
(40) 

(25) Murray Mursell test (remedial) (25) 

(24) Murray Mursell test (diagnostic) 
(.24) 

( 8) Labor and Effort (peer rating) (—.24) 
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Factor VIII. Bowlus Scale. The amount | 


of variance explained by this factor is so | 


low that it is hardly worth an interpreta- 
tion. The Bowlus Scale seems to identify 
those who tend to be high on contributions 
of labor and effort. Ratings on the Bowlus 
Scale, also, seem to be biased toward up- 
per-class members of the group. The vari- 
ables included in this factor are: 


(20) Bowlus Scale (.30) 
(18) Labor and Effort (nomination) (29) 
(29) Year in college (25) 


Discussion 


One of the principal reasons for doing 
the study was to furnish some informa- 
tion about case study tests. The writer 
had questioned the adequacy of the multi- 
ple-choice examination as an evaluation 
instrument when application of material 
learned is a desired goal of a course. 

Inasmuch as both the multiple-choice 
examinations and the case study tests show 
a substantial loading on Factor I, General 
Achievement Ability, it seems that at least 
a minimum amount of intelligence and 
knowledge is necessary for good perform- 
ance on both kinds of tests. A high level 
of intelligence and achievement is certainly 
a desirable asset to someone being pre- 
pared for the teaching profession. 


Factor II, Knowledge of Facts and Prin- | 


ciples, indicates that multiple-choice ex- 
ams also measure something independent 
of intelligence and achievement level; this 
seems to be the ability to learn factual 
material in psychology. Multiple-choice 
examinations show a relationship with at- 
titudes frequently presumed undesirable 
for persons entering the teaching profes 
sion (as indicated by low scores on the 
MTAI). By the end of the quarter this 


relationship is not as pronounced, but this | 


might be a function of an ability by them 


to bias the MTAI in terms of knowledge 


gained during the quarter, rather than any 
change in attitude. It is also interesting to 
note that persons who get higher scores 
on multiple-choice tests, rate contribution 
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of friendly atmosphere as being their 
strongest point in group discussions. 

Factor III, Applicational Ability, indi- 
cates that persons who perform well on the 
case study tests have a favorable attitude, 
both on entrance to the course and at the 
end of the quarter. These people also tend 
to be rated as the leaders of their group. 
These high scores would also show a tend- 
ency to perceive contribution of labor and 
effort as their strongest asset in group par- 
ticipation. Factor III is also independent 
of general intellectual level. 

It has already been stated that general 
intelligence and high achievement are de- 
sired characteristics of prospective teach- 
ers. Inasmuch as both the multiple-choice 
tests and the case study tests show a rela- 
tionship with this factor, it would seem 
that both types of tests may have merit as 
an achievement measure. A further com- 
parison of multiple-choice and case study 
tests made by comparing Factor II with 
Factor III indicates that Factor III, which 
is dominated by the case study tests, also 
shows relationships between favorable at- 
titudes and leadership as well as applica- 
tional ability. Both of these attributes 
would seem to be desirable characteristics 
to cultivate in teacher training programs. 
Thus, the case study tests appear to be 
making a more comprehensive evaluation 
of students’ success in achieving the goals 
of a teacher training program. 

If one is willing to accept the desirability 
of the qualities which show loadings on 
Factor III, it would seem that he must 
also admit the undesirability of some of 
the qualities of Factor II. The writers 
would hesitate to make the recommenda- 
tion that multiple-choice examinations be 
eliminated and that nothing but case study 
tests should be used in classes where ap- 
plication of knowledge is desired. Further 
research might indicate additional advan- 
tages of the multiple-choice examination 
not shown in this study. The point is made, 
however, that the present emphasis on the 
use of multiple-choice tests exclusively 
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might be modified to supplement testing 
with a test that is designed to measure 
ability to apply knowledge. The case study 
type test is recommended. 

One major advantage that the case study 
test seems to offer is that the case study 
sets the situation more clearly and pro- 
vides more information on which to base 
an application response than an isolated 
item can. It is conceivable that the use of 
the tase study might be useful in setting 
the situation more clearly in various other 
kinds of tests such as tests of personality 
and interests. At least it would seem that 
this is a possibility worth further explora- 
tion. 


SUMMARY 


A study was done of the evaluation 
techniques used in an elementary course 
in educational psychology. One of the ma- 
jor purposes of this study was to evaluate 
several tests of the ability to apply knowl- 
edge. These tests involved the use of a 
case study followed by questions to be 
answered by the students. Data were col- 
lected on 30 variables. A factor analysis 
was performed and seven factors were ex- 
tracted. A factor of general achievement 
ability was found which indicated that 
performance on the case study tests was 
related to general ability to achieve. An- 
other independent factor emerged, how- 
ever, which indicated that the case study 
tests may also be related to the ability to 
apply knowledge and included such favor- 
able things as good attitude on the MTAI 
and leadership in small groups. A third 
factor emerged which indicated that what- 
ever multiple-choice tests measure beyond 
general achievement ability is not neces- 
sarily related to desirable qualities of our 
future teachers, although an abundant 
knowledge of facts and principles may be 
desirable in some situations. For courses 
where a major goal is application of knowl- 
edge, the case study approach to testing 
for evaluative purposes seems to be a 
worthwhile and justifiable approach. 
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THE EFFECT OF INTRACLASS ABILITY GROUPING ON 
ARITHMETIC ACHIEVEMENT IN THE SIXTH GRADE 


NORMAN E. WALLEN anp ROBERT O. VOWLES 
University of Utah 


In this study, we have compared the 
achievement outcomes of intraclass ability 
grouping vs. nongrouping procedures in 
arithmetic instruction where the basis of 
grouping is the students’ arithmetic 
achievement level. We were concerned with 
the comparison of the two teaching meth- 
ods and with the possibility that the effi- 
cacy of the two methods would differ with 
different teachers. 

The effect of ability grouping for in- 
structional purposes, though frequently 
debated, has been the focus of surprisingly 
little research. In clearly defined skill 
areas such as arithmetic, the desirability 
and feasibility of basing instruction on the 
learner’s present level of performance 
would seem clearly indicated. Disagreement 
exists, however, as to the procedures most 
likely to approximate the ideal of indi- 
vidualized instruction. Proponents of abil- 
ity grouping contend that such procedures 
provide a better approximation than when 
grouping is not employed, since the teacher 
is able to deal with a restricted range of 
ability. Opponents of grouping contend 
that a teacher can provide adequate indi- 
vidualized instruction and that grouping 
fosters undesirable social and emotional 
learning. 

The only published research on this 
problem of which we are aware is that of 
Jones (1948). Five teachers employed in- 
telligence and achievement data on their 
fourth-grade students in attempting to 
individualize instruction, in part through 
grouping procedures, though the precise 
bases for grouping are not stated. In arith- 
metic, as in other skill areas, these stu- 
dents showed significantly more gain over 
an academic year’s time than a matched 
control group selected from the classes of 
an additional twenty-six teachers, wherein 
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less specific provision for individual differ- 
ences was made. This study supports the 
notion that a concerted attempt to indi- 
vidualize instruction is desirable, but does 
not permit evaluation of specific proce- 
dures or comparison of their effectiveness 
with different teachers. 


PROCEDURE 


Two elementary schools were involved in 
the study, both serving rather new sub- 
divisions in the suburbs of Salt Lake City. 
The educational and economic status of 
the families served by both schools is some- 
what above average. Each school employs 
two teachers of the sixth grade (one male 
and one female), making a total of four 
teachers who participated in the study. 
They had all been teaching for at least 
five years prior to the study. 

In order to facilitate equating of the 
four classes studied in terms of arithmetic 
ability at the beginning of the experimental 
year, scores on the arithmetic subtest of 
the California Achievement Tests given in 
the spring of the previous year were used 
in allocating students to the two sixth 
grades within each school. The scores were 
simply ranked and the student represented 
by every other score assigned to the same 
class. A different form of the test was ad- 
ministered during the first week of the 
sixth grade, both as a check on the match- 
ing of classes and as the base point from 
which to assess gains. In order to facilitate 
statistical analysis, the scores of four stu- 
dents (randomly selected) were excluded 
from analysis (three from one class and 
one from another) leaving an N of 25 in 
each of two classes and an N of 31 in each 
of the other two classes. The net result of 
these procedures was to provide at the 
beginning of the experimental year four 
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TABLE 1 
MEANS AND STANDARD DEVIATIONS OF 
GRADE-EQUIVALENT SCORES OF THE 
Four CLASSES AT THE BEGINNING 
OF THE StTupY 











Class N Mean SD 
I 31 5.78 .67 
II 25 6.05 .73 
III 31 5.86 -76 
IV 25 5.93 .81 





groups of students quite closely equated 
as to initial achievement (and closely ap- 
proximating the expected values for mean 
and standard deviation) as shown in Table 
1. 

In order to control the dimension of 
teacher personality and to permit evalua- 
tion of teacher-method interaction, each of 
the four teachers employed both methods. 
Within each school, one teacher taught 
the first half of the year using the group- 
ing approach and then changed to a non- 
grouping approach for the second half of 
the year, with the same students through- 
out. The other teacher in the school began 
the year with a nongrouping approach and 
changed to grouping at midyear. Addi- 
tional forms of the California test were 
administered at midyear and again in the 
spring providing an achievement meas- 
ure for each student after exposure to each 
of the two methods. The time lapse be- 
tween fall and midyear testing and between 
midyear and spring testing was very close 
to 70 school days. The overall design of the 
study is perhaps best communicated by 
Table 2. Sequence A simply means that the 
nongrouping method was used during the 
first half of the year, followed by the 
grouping method during the second half 
of the year. Sequence B was the reverse; 
these two teachers began with the group- 
ing method and changed to nongrouping at 
midyear. 

The basis for grouping, when done, was 
the appropriate administration of the 
Arithmetic subtest of the California 


Achievement Test. Thus, the fall testing 
was used for the two classes in sequence 
B and the mid-year testing was used for 
the two classes in sequence A. In all cases, 
the teachers used the ranking of test scores 
to divide the class into four homogeneous 
groups of approximately equal size; the 
smallest subgroup was five, the largest was 
eight. 

Regardless of the method being used, the 
amount of time devoted to arithmetic was 
to be the same; a daily period of 40 or 50 
min. We feel confident that this schedule 
was adhered to as rigidly as is reasonable 
to expect and resulted in no advantage to 
either method. The essential differences in 
classroom procedure were as follows: 


Nongrouping. All students used the same 
text. Each instructional period began with a 
presentation or discussion by the teacher, 
after which each student worked individu- 
ally, the teacher giving individual help to 
the students. 

Grouping. Each of the four groups used a 
text appropriate to their level of perform- 
ance. The teacher met with each group—as 
a group—for at least three 20 min. periods 
each week during which time the students 
were encouraged to help each other as well 
as receiving help from the teacher. The re- 
mainder of the time was devoted to indi- 
vidual work. 


The statistical procedures were as fol- 
lows: 

Each of the three score distributions 
(fall, winter, and spring) was normalized 
over all 112 students and each score con- 
verted to a standard score (X = 50, S = 
10) based on the appropriate distribu- 
tion. Analysis of covariance was then em- 
ployed, using winter and spring scores as 
the dependent variate, using fall scores as 
the covariate for winter scores, and using 
winter scores as the covariate for spring 
scores. 


RESULTS AND DISCUSSION 


Table 2 indicates the mean, standard 
deviation, and number of cases for each 
cell of the design and the means for the 
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TABLE 2 
CeLL MEANS AND STANDARD DEVIATIONS 
} 
School Grouping | Nongreuping | Seater Means| School Means 
aes | ie 
1 Sequence A | Second Semester | First Semester 48.74 
(Teacher I) Mean 48.29 | Mean 48.65 48.47 
SD 9.42 | SD 9.96 
N31 | N31 
| | 
1 Sequence B First Semester Second Semester | 
(Teacher IIT) Mean 50.19 Mean 47.84 49.01 
SD 9.61 SD 9.32 | 
N 31 N 31 
2 Sequence A | Second Semester First Semester 
(Teacher II) | Mean 50.68 Mean 48.64 49.66 51.46 
SD 9.77 SD 7.70 
N 25 N 25 
2 Sequence B First Semester Second Semester | 
(Teacher IV) Mean 52.44 Mean 54.08 53.26 
SD 10.08 | SD 10.41 
| WN 25 | N 25 
Methods Means | 50.27 49.63 Grand Mean = 49.95 
TABLE 3 
ANALYSIS OF COVARIANCE 
_— —— | | = boas 
Adjusted Mean 
df | x? | xy | y 32 | Square F 
——|———__ a 
Uncorrelated Components | | 
Between Schools 1| 194 281 409 51 51 3.48 
Between Sequences 1 97 | 140 205 27 27 1.84 
School by Sequence Interaction (same as | 1} 00 | 10 129 ill lil 7.57° 
Residual Between Teachers) 
Between individuals within School Se- 107 | 19574 18481 19019 1570* 14.67 
quences 3900° 36.47 | 3.48° 
Correlated Components | 
Between Semesters | 1 00 00 00 00 00 00 
Between Methods 1 87 —45 24 2 2) <1.00 
(Teacher by Method Interaction) | 3 | 90 —43 149 130 43.33 4.14° 
School by Method Interaction | 1| 57 | 23 7 33 33} 3.15 
School by Semester Interaction 1 33 | — 66 142 o4 4 8.98° 
Within Cells (Subject by Method Inter- | 107 | 1372 —535 1329 1120 10.47 
action) 
Total 221 | 21,414 16,870 21,264 























* Significant at .01 level. 
* When used as error term. 
> When used as “effect”’. 


method, teacher, and school effects. A Bart- 
lett analysis indicated no significant dif- 
ferences among the cell variances. Table 3 
Treports the results of the analysis of co- 


variance. It will be noted that the “school 
by sequence” interaction is the same as a 
Between Teachers effect corrected for 
school and sequence effects. The appropri- 
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ate error term for testing the uncorrelated 
components is the Between Individuals 
sums of squares and for the correlated 
components and the Between Individuals 
tests the appropriate error term is the 
Within Cells sums of squares. 

There is no support in these data for 
the notion of a difference between group- 
ing and nongrouping procedures, the initial 
question posed in the study. Nor are there 
significant differences Between Sequences 
or Between Schools, though the latter ef- 
fect approaches significance at the .05 
level. The Between Semesters contribu- 
tion to the total variance was inevitably 
zero, since the initial standardizing of 
the three distributions of scores required 
that the means of the first and second 
semester distributions be equal. 

Differences which are significant at the 
01 level were obtained for the following 
components: 


1. Between Individuals. This finding indi- 
cates that there are differences among stu- 
dents in achievement when the measure used 
is the sum of their midyear and final scores 
with each, in effect, corrected for perform- 
ance at the beginning of the semester. 

2. Between Teachers (School by sequence 
interaction). This finding indicates differ- 
ences between the four teachers in terms of 











TABLE 4 
TEACHER BY METHOD INTERACTION EFFECT 
School Teacher Grouping Nongrouping 

1 I — .50 +.50 
(second (first 
semes- semes- 
ter) ter) 

1 III + .86 — .86 
(first (second 
semes- semes- 
ter) ter) 

2 II +.70 —.70 
(second (first 
semes- semes- 
ter) ter) 

2 IV —1.14 +1.14 
(first (second 
semes- semes- 
ter) ter) 
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TABLE 5 
ScHoot BY SEMESTER INTERACTION EFFECT 











School First Semester Second Semester 
1 + .68 — .68 
2 —1.24 +1.24 





student achievement. Inspection of Table 2 
reveals a marked superiority on the part of 
students taught by Teacher Number 4. Ref- 
erence to the original, untransformed data 
shows that whereas the other three teachers 
ranged from 47 to .54 in mean gain in grade 
equivalent score per semester, the mean gain 
for Teacher 4 was .75. The expected value 
based on test norms is .50. In exploring pos- 
sible reasons for this difference, one dimen- 
sion which emerged was that of “homework.” 
Discussion with the teachers indicated that 
this teacher tended to assign more homework 
than the other three. In view of the generally 
negative findings as to the effect of home- 
work (Otto, 1950) we find it difficult to at- 
tribute differences to this variable. 

Another explanation, which we favor, is 
that this teacher is better able to individu- 
alize instruction. This possibility is sug- 
gested by the finding that this teacher 
achieved considerably better results with the 
nongrouping method. Due to complications 
in interpretation of the Teacher by Method 
Interaction, however, this interpretation is 
highly tentative. 

3. Teacher by Method Interaction. It will 
be recalled that the possibility of an inter- 
action effect of teachers and methods was 
one of the questions of interest. The signifi- 
cant interaction appears, at first, to support 
this notion. Table 4 presents the interaction 
effect. Subsequent partitioning of the inter- 
action into three sums of squares (Sequences, 
School by Semester and School by Methods) 
each with one degree of freedom complicates 
the interpretation of the interaction how- 
ever, due to the school by semester inter- 
action, as shown by the significant F value 
and as illustrated in Table 5. In School 2, 
considerably better results were obtained 
during the second semester, regardless of the 
method used. Whereas in School 1, the dif- 
ference is in favor of the first semester, 
though to a lesser degree. 


Several explanations of this last rather 
surprising finding are possible. The first is 
simply that some variable such as content 
emphasis or school morale was operating 
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differentially in the two schools during dif- 
ferent semesters. Our knowledge of the 
schools and teachers makes it difficult for 
us to accept this explanation. A second 
possible explanation is that the School by 
Semester Interaction is an artifact result- 
ing from the particular arrangement of 
teacher-method combinations. It is of in- 
terest to note in this connection that 
Teacher Number 4, who attained consider- 
ably greater improvement over the entire 
year, showed the largest interaction effect 
and in favor of the nongrouping procedure, 
suggesting that the teacher who is suc- 
cessful in individualizing instruction may 
find grouping procedures a hindrance. Yet 
another suggestive observation is that both 
of the teachers who show an interaction ef- 
fect in favor of nongrouping are males, 
whereas the interaction effect is in favor 
of grouping for both female teachers. The 
present study cannot provide evidence as 
to which of these, or other, explanations 
is best. It is our opinion, however, that an 
explanation in terms of interaction be- 
tween method and teacher personality is, 
on logical grounds, superior to an explana- 
tion in terms of school by semester inter- 
action. 


SuMMARY 


Each of four sixth-grade teachers em- 
ployed both ability grouping and non- 
grouping methods in arithmetic instruc- 
tion during one academic year. Grouping 
was based on the Arithmetic test of the 
California Achievement Tests. Students 
were tested at the beginning of the year 
and at the end of both the first and sec- 
ond semesters with equivalent forms. A 
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factorial design and analysis of covariance 
enabled the dimensions of teacher differ- 
ences, sequence of procedures and student 
differences to be controlled and evaluated. 
No significant difference was found be- 
tween grouping and nongrouping proce- 
dures. A significant teacher by method in- 
teraction was found, the interpretation of 
which is complicated by a significant school 
by semester interaction. Several possible 
explanations of this finding are offered. In 
addition, significant differences were found 
between students and between teachers, 
one teacher achieving considerably higher 
student performance than the others. 
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USE OF VERBAL REINFORCEMENT IN DEVELOPING 
GROUP DISCUSSION SKILLS 


M. RAY LOREE anp MARGARET B. KOCH 


Louisiana State University 


The application of principles derived 
from reinforcement learning theories to 
the task of guiding students’ attainment 
in the more complex educational objectives 
of the school presents many problems. 
How does the social science teacher who 
wishes to develop the students’ ability to 
think more effectively on social science is- 
sues go about identifying the responses or 
the classes of responses to be reinforced? 
What are the kinds of appropriate and 
effective rewards that can be manipulated 
by the teacher in the classroom? Is it pos- 
sible for the composition teacher to pro- 
vide immediate reinforcements for the 
host of responses made by the student in 
the process of writing a theme? 

Certain research findings suggest pos- 
sible avenues of investigation that may 
shed light on the above questions. The 
technique employed by Bloom and Broder 
(1950) constitutes one approach to the 
problem of identifying adequate response 
patterns in complex verbal problem solv- 
ing. Responses of “good” and “poor” prob- 
lem solvers as they “thought aloud” while 
working on examination questions were 
compared and analyzed and crucial differ- 
ences in their problem solving processes 
were identified. Some technique designed 
to identify adequate responses would seem 
to be a first prerequisite to any effort to 
utilize reinforcement concepts in guiding 
problem solving type learning. For certain 
kinds of complex learning a second pre- 
requisite may be the identification of what 
Skinner (1953) refers to as the “contin- 
gencies of reinforcement”—the responses 
that need to be reinforced in order to in- 
crease the probability of the occurrence 
of the desired response pattern. 

What rewards may be manipulated by 
the teacher in order to reinforce appro- 


priate responses? Obviously the rewards 
most frequently used in simple learning 
experiments, i.e., food and water, have 
little if any appropriateness for the class- 
room. However, it is well established 
that another form of reward—‘knowledge 
of results”—facilitates learning (Wolfe, 
1951). Studies on the conditioning of ver- 
bal behavior indicate that simple verbal 
cues such as “good” or “mmm-hmm” may 
serve to reinforce specific or even gener- 
alized response patterns, and that this re- 
inforcement occurs even when the subjects 
of the experiment are not aware that 
certain of their responses are being re- 
inforced (Krasner, 1958; Salzinger, 1959). 
Additional possibilities occur when stu- 
dents are working in groups. Because in- 
dividuals differ in personality, personal 
goals and in whether they are task ori- 
ented, self-oriented, or interaction oriented 
(Fouriezos, Hutt, & Geutzkow, 1950), it 
may be expected that in a group problem 
solving situation, what constitutes rein- 
forcement will vary from individual to in- 
dividual. 

The relative merits for school learning 
of immediate as opposed to delayed re- 
inforcement have not been fully explored, 
although immediate reinforcement is usu- 
ally assumed to be the more effective. 
However, it is difficult to implement im- 
mediate reinforcement for some kinds of 
school learning. The “teaching machines” 
described by Skinner (1958) represent one 
promising approach. The “stimulated re- 
call” technique described by Bloom (1953) 
may be a means through which immediate 
reinforcement conditions may be simu- 
lated. Bloom reports that students, listen- 
ing to a play-back of discussion and lec- 
ture sessions in which they participated, 
seem to relive the experience. Utilizing the 
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“stimulated recall” technique in a group 
problem solving session, it should be pos- 
sible to simulate immediate reinforcement 
conditions by interrupting the playback 
and providing verbal cues whenever a 
member of the group exemplifies good 
problem solving behavior. 


PuRPOSE 


The purpose of the present study is to 
investigate the effectiveness of positive re- 
inforcement in developing certain group 
discussion competencies of students. An 
effort was made to simulate “immediate” 
reinforcement through the use of the stim- 
ulated recall technique. Topics such as: 
“What are the possibilities of raising a 
child’s 1.Q.?” “How can we prevent for- 
getting?” “Of what psychological signifi- 
cance are the body changes that take place 
during adolescence?” were used for group 
discussion. References for each topic were 
provided. Group discussion competencies 
were assessed on the basis of performance 
on a 10-min. panel discussion. 


SuBJECTS 


Ninety-six undergraduate students en- 
rolled in four educational psychology 
classes at Louisiana State University con- 
stituted the Ss in this study. 


PROCEDURE 


Each of the four classes was divided into 
six panel discussion groups of four mem- 
bers each. Students were assigned to dis- 
cussion groups in such a way that the 
average academic grade point average for 
each group was approximately equal (the 
Tange in the group mean grade-point av- 
erage for the 24 groups was 1.5 to 1.6). 
One student with a very good academic 
record was assigned to each group and was 
designated as a group leader. 

The experimental instructional condi- 
tions for each class were assigned thus: 
the two classes of one instructor were 
designated as the control nonpractice 
groups; on the basis of a coin toss, one 
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class of a second instructor was designated 
as the experimental reinforced groups 
while the second class of this instructor 
was designated as the experimental prac- 
tice groups. 

Six pairs of discussion topics were se- 
lected for use in initial and final discus- 
sion sessions. For each topic assigned, a 
number of references were listed.’ An ef- 
fort was made to equate type of content 
and difficulty level for each pair of topics. 
For example, Topic 1 was: “Are newer 
methods of teaching reading superior to 
older methods?” while the paired Topic 
2 was: “What is the evidence concerning 
the effectiveness of newer methods of 
teaching arithmetic?” One experimental 
reinforced group (A) and one control 
group (B) were assigned Topic 1 for their 
initial panel and Topic 2 for their final 
panel. An experimental practice group (C) 
and one control group (D) were assigned 
Topic 2 initially and Topic 1 for their final 
panels. 

For the initial panel discussion sessions, 
the same procedure was followed in each 
of the four classes. After the instructor ex- 
plained in each class that an opportunity 
would be provided in the course for stu- 
dents to develop their skills in applying 
psychology to educational problems, the 
panel discussion procedure was explained 
and each student was assigned to a dis- 
cussion group. Topics were assigned and 
each group met for a few minutes to di- 
vide up the suggested references. Students 
were requested to have their readings on 
the topic completed for a class session a 
week hence, when they would be given a 
class period to plan for a 10-min. panel 


*The topics used along with their refer- 
ences may be found under “Discussion Ques- 
tions” in M. R. Loree (Ed.) Educational 
psychology, New York: Ronald Press, 1959. 
Topic number, question number, and page 
reference for each of the twelve topics are as 
follows: 1 - 2 - 30; 2-3 - 30; 3 - 1 - 101; 
4-2-101; 5-1-138; 6- 2-138; 7-2 - 275; 
8 - 5 - 275; 9 - 1 - 328; 10 - 4 - 328; 11-2- 
328 ; 12 - 3 - 328. 
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discussion that was to be tape recorded. 
The whole procedure was presented in 
terms of an instructional technique to 
reach important educational objectives of 
the course rather than in terms of a re- 
search study. Two days after the class 
period devoted to planning, groups were 
scheduled to meet in offices and carry out 
their 10-min. panel discussions. At this 
point in the experiment the procedure was 
varied for each of the four classes. 

For the control nonpractice classes no 
further panel discussions were assigned 
until the final panel toward the end of the 
semester. It was thought that possibly a 
course in educational psychology might 
eontribute to skillful performance in panel 
discussions on topics in educational psy- 
chology. For the two experimental classes 
two additional panel discussion topics were 
assigned to each group during the semes- 
ter. The topics were of a similar nature 
to those used in the initial and final panel 
discussions. The method of assigning and 
carrying out the additional panel discus- 
sions was identical for both experimental 
classes. The difference in treatment was 
in having each group in the reinforced 
class listen to a playback of its panel. All 
four classes were informed that their per- 
formance in the final panel would be 
graded and counted as part of their final 
course grade. 

In the experimental practice class, after 
each of the panel discussion sessions, the 
instructor talked only in general terms 
about the performance of the groups on 
the panel. However, quite specific infor- 
mation was given to the class on the char- 
acteristics of a good panel discussion. The 
categories, later used in rating transcripts 
of initial and final panels, were discussed. 
It was explained that a good panel dis- 
cussion contained an introduction, a main 
body, and a conclusion. Each of these as- 
pects of a panel discussion was discussed 
in detail. It was explained, for example, 
that an introduction serves the purpose of 
making clear the kind of information 


needed to discuss the topic; that some- 
times there was need to define key terms 
of the topic; and sometimes it was advis- 
able to delimit the topic. No playback of 
the panel was provided, however, and spe- 
cific knowledge of results was not provided 
to each group. 

The instructor in the experimental re- 
inforced class scheduled each of the groups 
to meet in an office to hear a playback of 
the panel discussion. During the playback, 
the instructor stopped the recording at key 
points—whenever a panelist was caught 
employing good panel discussion tech- 
niques. The instructor would then make 
comments such as: 


Good, you have defined a key term in this 
topic. 

Good, your report of this research study is 
succinct, yet you have brought out the in- 
formation that is relevant. 

Notice how the question asked by the last 
speaker leads to a clarification of the point 
previously made. 


Only positive reinforcement was provided, 
although after the members of the group 
had listened to the complete playback an 
evaluation of the panel as a whole was 
given and weaknesses were pointed out. 


Treatment of Data 


Transcripts of the 24 initial and the 24 
final discussion sessions constitute the raw 
data for this study. These transcripts were 
coded so that raters could not tell whether 
the discussion represented an initial or a 
final panel, a panel of a reinforced group, 
a practice group, or a control group. 

A panel discussion rating form was de- 
vised that included the following catego- 
ries: 

Part I—Introduction 

Part IIA—Main Body ratings for in- 

dividual panelists, including ratings 
on: 

1. Extracting relevant material from 
references 

2. Reporting accurately contents of 
reference 
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3. Avoiding irrelevant details 

4. Identifying source of evidence 

5. Utilizing experimental evidence 

rather than personal opinion, 
Part I[B—Main Body ratings on the 
panel as a whole, including ratings 
. Interpreting the reference material 
. Organizing materials 
. Interacting rather than merely pre- 
senting a series of speeches 
Part I1J—Summary, including ratings 
on: 
1. Identifying essential elements of 
the problem 
2. Referring to the major findings 
contained in the main body 
3. Evaluating the completeness of the 
discussion. 
Each of the 12 categories was rated on a 
13 point scale except the category “iden- 
tifying source”. This category was rated on 
a three point scale. The individual Main 
Body ratings were averaged for each panel. 
Thus, it was theoretically possible for a 
panel to obtain a total rating as high as 
146 points. 

Prior to the actual rating of the tran- 
scripts, two graduate students were each 
given six of the discussion topics and asked 
to extract for each listed reference the 
material relevant to the discussion topics. 
The principal investigators then took these 
summaries and defined as explicitly as 
possible a scoring system for each cate- 
gory. For the relevancy score, for example, 
six major points might be extracted from 
a particular reference. The numerical rat- 
ings to be assigned for all possible com- 
binations of the six major points were 
then agreed upon by the two principal 
investigators. The task of the two raters 
became then to independently rate each 
panel utilizing the agreed upon rating 
plan. The total time consumed by the two 
principal investigators in rating was about 
200 hours. A Pearson product-moment 
correlation coefficient of .972 was obtained 
for the independent ratings. After the four 
transcripts for one topic were rated, the 
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two raters went over their ratings together 
and, where there were discrepancies, came 
to an agreement rating. 


RESULTS 


Although groups were equated for ac- 
ademic ability, there was no way to equate 
groups for their initial ability to perform 
in a panel discussion. An analysis of vari- 
ance of the four groups indicates that, al- 
though differences on the initial panel dis- 
cussions exist in the means of the groups, 
these differences are not significant (F = 
00). However, to adjust the final scores 
for the small differences existing, an analy- 
sis of covariance was used (Edwards, 
1950). This analysis indicates that the 
difference between groups is significant 
(F = 15.90, df = 3 and 19, p < 01). 
Table 1 shows the initial, final, and ad- 
justed means for the six panels in each of 
the four classes. 


SuMMARY 


This study is concerned with the effect 
of positive reinforcement in developing 
certain group discussion competencies. 

Each of four educational psychology 
classes was divided into six panel discus- 
sion groups. Each group conducted a panel 
discussion at the beginning of the semester 
and, on a matched topic, at the end of the 
semester. All panel discussions were tape 
recorded, and later transcribed, assigned 
a code number to insure anonymity, and 


TABLE 1 

INITIAL AND FINAL MEANS AND STANDARD 
DEVIATIONS OF REINFORCED, PRACTICE, 

AND ConTROL CLASSES 








Rein- 











forced | Groups| Groupe | Groupe 

our | | @) | @) 

Initial Means 70.5] 66.0 | 65.0 70.2 

SD 18.6} 6.7| 11.7] 7.1 

Final Means 103.5) 68.5 | 67.7 | 70.2 

SD 12.6| 12.0| 8.4! 8.0 

Adjusted 102.4) 69.3 | 68.9 | 69.3 
Means 
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rated independently by two judges. An 
agreement rating was obtained where dis- 
crepancies in judgments between raters 
occurred. The agreement ratings on the 
final panel discussions constitute the de- 
pendent variable for this study. The in- 
dependent variable is the type of training 
provided between initial and final panel 
discussions. No additional practice in par- 
ticipating in panel discussions was pro- 
vided for the groups in the two “control” 
classes. Two additional panel discussions 
were conducted by the groups in the two 
experimental classes. For the “practice” 
groups the discussions were tape recorded 
but not played back to the students nor 
evaluated as to their quality. Discussion 
of the “reinforced” groups were tape re- 
corded and later played back to the panel- 
ists in order to stimulate recall of their 
performances and provide a setting in 
which immediate reinforcement could be 
simulated. During the playback the in- 
structor stopped the recordings at points 
when a panelist was performing well and 
briefly commented on the merit of the 
panelist’s contribution. 

Results of an analysis of covariance sup- 
port the hypothesis that group discussion 
competencies of students can be consider- 
ably improved by simulating “immediate” 
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reward conditions through the use of stim- 
ulated recall techniques. Practice without 
“immediate” reward conditions had no 


significant influence on group discussion } 


abilities. 
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RELATIONSHIPS BETWEEN PUPIL ACHIEVEMENT, PUPIL 
AFFECT-NEED, TEACHER WARMTH, AND 
TEACHER PERMISSIVENESS 


C. M. CHRISTENSEN 
University of Alberta’ 


Guidance or direction of pupil learning 
and general arousal or motivation of pu- 
pils are, in the present study, considered 
two important factors influencing the 
achievement of pupils. The directiveness- 
permissiveness (permissiveness) of the 
teacher in teaching subject matter is con- 
sidered an aspect of the guidance factor. 
The affective response (warmth) of the 
teacher to the pupil and affect-need of the 
pupil are considered significant aspects of 
motivation. 

The study of teacher permissiveness and 
warmth as separate dimensions is likely 
to help clarify teaching process problems. 
Cronbach (1954, Ch. 15) makes a similar 
distinction and points out that there has 
been a tendency to equate warmth with 
permissiveness and harshness with direc- 
tiveness. Casual observation suggests that 
it is possible for a warm teacher to be 
either permissive or directive. Similarly, a 
harsh teacher might be either permissive 
or directive in her teaching of subject mat- 
ter. When permissiveness is considered 
separately from warmth, it is plausible to 
hypothesize that permissiveness will be 
negatively related to achievement of pu- 
pils. 

Regarding warmth of teacher as a moti- 
vational factor, a positive response by the 
teacher is expected to result in optimum 
arousal or motivation, whereas an indif- 
ferent or negative response is expected to 
result in either less than or greater than 


*This study was carried out while the au- 
thor was at the New York State Education 
Department. The assistance of P. A. Cowen, 
former acting assistant commissioner of the 
Research 
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Division, is gratefully acknowl- 
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optimum motivation. Literature in the 
area of psychotherapy could be cited to 
point up the general reactions of individ- 
uals to the affective response of others. 

The effect of the teacher’s affective re- 
sponse on the pupil would be modified to 
some extent by the needs of the pupil. For 
example, pupils with high affect-need 
would probably react differently to teacher 
responses than pupils with high cognitive- 
need (Della Piana & Gage, 1955). This, of 
course, would have a bearing on the ef- 
fectiveness of teacher response as a moti- 
vational or arousal factor. 

The present study was concerned with 
pupil achievement as the dependent vari- 
able and pupil affect-need, teacher warmth, 
and teacher permissiveness as independent 
variables. The following hypotheses were 
tested: 

1. Positive affective response (warmth) 
of teacher is positively related to achieve- 
ment gains. 

2. Permissiveness of teacher is 
tively related to achievement gains. 

3. Teacher warmth and permissiveness 
interact significantly such that warm, di- 
rective teachers will produce the greatest 
achievement gains. 

4. Affective needs of pupils interact sig- 
nificantly with teacher warmth and per- 
missiveness. 


nega- 


METHOD 


Subjects 


Subjects included 10 fifth-grade classes 
of pupils, 10 fourth-grade classes of pupils, 
and 10 fourth-grade teachers. This repre- 
sented all the fourth- and fifth-grade pu- 
pils, and all the fourth-grade teachers in 
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one New York state suburban school sys- 
tem. 


Instruments 


Brief descriptions of the instruments 
used in this study are given below. Some 
of the items included in the Warmth and 
Permissiveness scales were devised by the 
writer, others were selected from existing 
scales (Della Piana, 1953; Leeds, 1950; 
Leeds, 1954; Medley & Klein, 1956). 

Permissiveness. The Permissiveness scale 
attempts to measure the permissive-direc- 
tive behavior of the teacher in her teach- 
ing of subject matter. Forty of the follow- 
ing kinds of items were included: 

Do the pupils usually help plan what the 
class is going to do? 

When you work arithmetic problems do 
you have to show all your work? 

Does your teacher make assignments 
every day? 

Does your teacher assign the pages to 
be read in your science book? 

Does your teacher push some pupils to 
try a little harder? 

Pupils responded yes or no to all items. 
The teacher score was obtained by com- 
puting a mean of the pupil scores. High 
teacher scores represent greater permis- 
siveness. 

Warmth. The Warmth scale attempts 
to measure the affective response of the 
teacher to pupils. Forty of the following 
kinds of items were included: 

Does your teacher ever laugh and joke 
with the pupils? 

Is it easy to talk to your teacher when 
you feel bad about something? 

Is your teacher easily annoyed or both- 
ered? 

If you made a mistake would you be 
afraid to tell your teacher about it? 

Does your teacher ever say mean things 
to the pupils? 

Like the Permissiveness scale, pupils re- 
sponded yes or no to all items and the 
teacher score was obtained by computing 
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a mean of the pupil scores. Higher teacher 
scores represent more positive affective 
response. 

Affect-Need. A cognitive-affective scale’ 
devised by Della Piana and Gage (1955) 
was used to measure affect-need of pupils. 
The scale consists of 36 paired-comparison 
items of the following kind: 

Which do you want more in a teacher? 
1A. Explains so we can understand. 
1B. Is nice to us even if we do some- 

thing wrong. 

In the present study a high score re- 
flects high affect-need. 

Achievement. The Iowa Tests of Basic 
Skills were used to measure pupil growth 
in achievement. This is a standardized 
achievement battery including the follow- 
ing five subtests: (a) Vocabulary, (6) 
Reading Comprehension, (c) Language 
Skills, (d) Work-Study Skills, and (e) 
Arithmetic Skills. The battery also pro- 
vides one overall score called a composite 
score. 


Collection of Data 


The Warmth, Permissiveness, and Af- 
fect-Need scales were administered to the 
fourth-grade pupils. All items were read 
to the pupils by the writer. The teacher 
was not present during the administration 
of the scales. 

The Affect-Need scale was administered, 
in a similar fashion, to the ten classes of 
fifth-grade pupils. In addition the Iowa 
Tests of Basic Skills had been administered 
to these pupils, during the first month of 
the fourth grade and the first month of 
the fifth grade. This permitted the meas- 
urement of growth in achievement for the 
present fifth-grade pupils during their stay 
in the fourth grade. 

It should be noted that the teacher 
Warmth and Permissiveness scores were 
derived from the fourth-grade pupil re- 
sponses and growth in achievement was 


* Thanks is due N. L. Gage for providing 
a copy of the scale. 
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derived from the previous year’s fourth- 
grade pupils who were now in the fifth 
grade. For the statistical analyses, the 
fifth-grade pupil achievement scores were 
grouped according to their fourth-grade 
teachers. 


RESULTS 


Characteristics of Scales 


Table 1 gives evidence that the teachers 
differ significantly on both the Warmth 
and Permissiveness scales. Analyses of 
variances yielded significantly greater 
among-teacher variances than within- 
classes variances. Variances for pupils 
within classes is small compared to the 
variances computed for teacher scores 
based on mean pupil ratings. 

Following a procedure used by Mitzel 
and Medley (1957), reliability coefficients 
for teacher scores on the Warmth and Per- 
missiveness scales were estimated from the 
data given in Table 1. With this proce- 
dure, among-teacher variance is used as 
an estimate of obtained score variance, and 
among-teacher variance minus within- 
classes variance as an estimate of true 
score variance. Reliability is defined as 
the ratio of true score variance to ob- 
tained score variance. The reliability co- 
efficient for the Warmth scale was .91 and 
for the Permissiveness scale it was .94. 
This indicates that other classes of similar 
pupils would rank the teachers in almost 


TABLE 1 
ANALYSIS OF VARIANCE OF PERMISSIVENESS 
AND WarmMTH Scores OBTAINED FROM 
Ten Cuiasses or Pupits 


























| 
Permissiveness | Warmth 
Source af 
| 
MS | PF | mS | F 
—— | [- 
Among teachers | 9| 96.00 | 15.74° | 262.56 | 11.30° 
Within classes | 217| 6.10 23.23 
Total | 226 





* Significant at .01 level. 
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TABLE 2 
Propuct MoMENT CoRRELATIONS BETWEEN 
ArrecTt-NEED, PERMISSIVENESS, AND 
Warns Scores or 103 Bors anp 
121 Grris 
Warmth | Permissiveness 
Boys | Girls | Boys | Girls 
Affect-Need | —.17| —.06| .09 | .02 
Warmth .32* | .25* 





* Significant at .01 level. 


the same order as the classes who rated 
the teachers in the present study. 

Independence of the Warmth and Per- 
missiveness scales was also considered. An 
adequate test of independence should be 
based on the mean pupil ratings of their 
teacher. The rank correlation between 
Warmth and Permissiveness for ten teach- 
ers was .39. Although a rank correlation 
of this magnitude is not significant, the 
number of cases is too small for a rigorous 
test. 

Correlations, computed separately for 
boys and girls, between Warmth, Permis- 
siveness, and Affect-Need are given in Ta- 
ble 2. These correlations were based on 
individual pupil responses. Low positive 
but significant correlations were obtained 
between Warmth and Permissiveness. Af- 
fect-Need did not correlate significantly 
with either Warmth or Permissiveness. 
There is no evidence of sex differences. 

Della Piana and Gage (1955) reported a 
Horst reliability coefficient of .74 for the 
Affect-Need scale. 


Tests of Hypotheses 


The mean Warmth score and mean Per- 
missiveness score were obtained for each 
of the ten teachers. Eight of the teachers 
with the most extreme scores were classi- 
fied in four groups, two teachers to a 
group. The four groups resulting were as 
follows: High Permissiveness and High 
Warmth, High Warmth and Low Permis- 
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TABLE 3 
ANALYSIS OF VARIANCE OF ADJUSTED ACHIEVEMENT TeEsT Scores 
Vocabulary {Reading Com] Lagarnee | Work Sindy] Arigumetic | Composit 
Source af 

us | re | us |r| us |rius|r| us | F | us| F 
Permissiveness | 1| 9.59) | 71.52, | 17.19] _/33.65)1.26| 2.02 5.40) 
Warmth 1 259. 44/4.65*| 17.78; 156 .97|2.22|30.04/1. 12/113 .36/6.38*/49 39/3 .16 
Affect-Need 1| 19.65, — (301.55)3.55) 24.22 5.26] | 24.83/1.40 /21.52/1.38 
PX W 1 .42 130.361 .53} 44.54 12 1.42) 51) 
rae 1 |100.58,1.80 | 13.88) 1.11) 4.87) 31) 2.38 
WxXA 1 2.71) 97 .77|1.15) 27.45 52.48)1.97) 25.63/1.44 |25.23/1.61 
PXWXA 1 | 32.51) 29.91| 24.46/ 53. 10/199] 29.18/1.64 | 1.71| 
Error 71 | 55.76 | 85.02) 70.56 26.70 17.77) 15.64) 
Total 78 | | | | | | 

* Significant at .05 level. 

siveness, Low Warmth and High Permis- Discussion 


siveness, and Low Warmth and Low Per- 
missiveness. The pupils of the teachers in 
each of the groups were subdivided in 
terms of Affect-Need scores. For each 
group of teachers, ten high Affect-Need 
and ten low Affect-Need pupils were se- 
lected. This yielded a 2 x 2 x 2 factorial 
design with 10 individuals per cell. 

Covariance analysis was used to test 
the hypotheses. Beginning fourth-grade 
achievement scores were used to adjust 
the beginning fifth-grade achievement 
scores. This adjustment subtracts the pre- 
dicted beginning fifth-grade scores from 
the actual beginning fifth-grade scores. The 
scores used in the analyses can be thought 
of as growth in achievement. 

Table 3 gives a summary of the analyses 
for each of the subtests and composite 
scores for the Iowa Tests of Basic Skills. 
Only two significant results were obtained. 
Vocabulary and arithmetic achievement 
growth were significantly greater for teach- 
ers scoring high on the Warmth scale. No 
significant relationships were obtained for 
Affect-Need or Permissiveness and none 
of the interaction terms were significant. 
Only the first hypothesis was partially sub- 
stantiated. 


Evidence from the present study sup- 
ports the notion that warmth and permis- 
siveness can be studied as separate dimen- 
sions. The correlation between the Warmth 
and Permissiveness scales was low and 
both scales discriminated between the 
teachers. The high reliability coefficients 
obtained indicate that pupils within a class 
are consistent in describing their teacher. 
Pupils within a class can be regarded as 
several observers rating one individual. As- 
suming that errors of observations include 
both over- and underestimation, a score 
based on the mean of the ratings will pro- 
vide an accurate measure. 

Individual pupil ratings of teachers and 
Affect-Need scores of pupils were intercor- 
related. Affect-Need scores did not corre- 
late significantly with the pupil ratings of 
teachers. A low, positive but significant 
relationship between Permissiveness and 
Warmth was obtained. This could be in- 
terpreted either as a tendency for pupils 
to regard warm teachers as permissive or 
av an indication that warm teachers are, in 
fact, more permissive. The low intercorre- 
lations may be due to low reliability of the 
scales. It should be noted that reliability 
coefficients for individual pupil ratings and 
Affect-Need scores were not computed. 
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ACHIEVEMENT AND TEACHER WARMTH AND PERMISSIVENESS 


Of the variables considered, only warmth 
of teacher was related to vocabulary and 
arithmetic achievement. Plausible explana- 
tions can be offered for this differential 
relationship. Arithmetic achievement is 
much more dependent on school factors 
than, for example, reading. Little direct 
attention is given to arithmetic in the 
home and few pupils make daily use of 
arithmetic outside of school. The relation- 
ship between vocabulary achievement and 
warmth is, however, not as obvious as in 
the case of arithmetic. A reading special- 
ist® has pointed out that in attempts to 
improve reading, vocabulary is more re- 
sistant to improvement than reading com- 
prehension. She suggested that vocabulary 
depends on interest in words which is 
likely to be facilitated by a responsive 
teacher. Additional research is needed to 
clarify these interesting possibilities. 

Results support the contention that af- 
fective response of the teacher is more im- 
portant for growth in achievement than 
permissiveness. This is an interesting find- 
ing, particularly if it is further substanti- 
ated by additional research. It has theo- 
retical implications for the motivation of 
pupils and it also has implications for 
teacher training. Undoubtedly the devel- 
opment of affective responsiveness in 
teachers would call for quite different 
kinds of techniques than those now gener- 
ally employed in teacher training pro- 
grams. 

Affect-Need was in no instance related 
to achievement. This was somewhat sur- 
prising. Della Piana and Gage (1955) re- 
ported a differential relationship between 
Minnesota Teacher Attitude Inventory 
scores and pupil-rated liking of teachers 
for pupils of high and low Affect-Need. Of 
course, liking of teacher may not be re- 
lated to achievement. However, it was ex- 
pected that Affect-Need and Warmth 
would interact and influence achievement. 


*Marion Dixon Jenkinson, University of 
Alberta, personal communication. 
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A final note of caution is in order. This 
study should be regarded as exploratory 
in nature and not as definitive. It was ob- 
viously not an experiment but rather a 
study of relationships between selected 
teacher and pupil characteristics. A school 
system was entered intact and no manipu- 
lation of conditions took place. In contrast 
to an experiment, the pupils were not ran- 
domly assigned to teachers. Although some 
degree of control was obtained by the co- 
variance analysis, it is likely that other 
pupil characteristics influencing achieve- 
ment growth were operating. Random as- 
signment of pupils to teachers is probably 
the only effective way of controlling un- 
known, relevant variables. 


SuMMARY 


This study explored relationships be- 
tween permissiveness and warmth of 
teachers and affect-need and achievement 
of pupils. Permissiveness and Warmth 
scales were devised and administered to 
pupils in ten fourth-grade classes. An Af- 
fect-Need scale devised by Della Piana 
and Gage was administered to ten fifth- 
grade classes. Beginning fourth-grade and 
beginning fifth-grade achievement test 
scores were available for the fifth-grade 
pupils. The Permissiveness and Warmth 
scales were sufficiently reliable and inde- 
pendent for the purposes of this study. 
Affect-Need did not correlate significantly 
with either of the scales. A 2x2x2 fac- 
torial design with two levels of permissive- 
ness, warmth and affect-need was estab- 
lished. Covariance analysis was used to de- 
termine growth in achievement. Contrary 
to hypothesized outcomes, none of the in- 
teraction, permissiveness, or affect-need 
variances were significant. Warmth of 
teachers was significantly related to vocab- 
ulary and arithmetic achievement. 
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